A Multiple Random Feature Extraction Algorithm for Image Object Tracking

Show more

1. Introduction

Object tracking is an application in the field of computer vision. It is commonly used in monitoring systems or human-computer interaction. A variety of tracking algorithms have been proposed. In the IVT method [1] , the incremental principal component analysis is used to reduce the image space. The IVT method learns a target model that can be continually updated to adapt to track the changing target. In [2] , the target model is decomposed into several basic target models constructed by sparse principal component analysis. The tracker tracks the target with a set of additional basic motion models. It can deal with appearance transformation or movement. However, these methods have a huge amount of computation and make it difficult to run in real time.

The compressive tracking method [3] is a fast tracking algorithm using compressed sensing theory. It projects the high-dimensional image features into the lower-dimensional image space by a very sparse projection matrix, and tracks the target with the low-dimensional image features generated by random projection. It reduces a lot of image features needed to compare and greatly reduces the computational complexity of the algorithm. However, the image features generated by the projection matrix are completely random. Even in the same testing video, each time the image features have a considerable change. It makes the results of each execution sometimes good and sometimes bad and difficult to use effectively.

To solve the problem, an object-tracking algorithm with multiple randomly-generated features is proposed in the present paper. Tracking with the additional and different image features can produce a number of different tracking results. If we choose the most ideal tracking result as the final target position, there will be more opportunities to produce a better result than the original algorithm.

2. Proposed Method

In this paper, the proposed tracking algorithm is showed in Figure 1. In order to solve the drift problem caused by occlusion, we refer to [4] and apply the sub-region classifiers to our algorithm. We reduce the number of sub-region classifiers to speed up the algorithm. Only nine sub-region classifiers are used for tracking. In addition to this, the locations of the sub-region classifiers are evenly distributed in order to avoid excessive concentration or excessive dispersion of the sub-region classifiers. In the process of tracking, each sub-region classifier independently tracks the specified part of the target. If the target is partially occluded, only the occluded part of the tracking will be affected. As a result, the drift problem due to occlusion can be avoided. In the classifier update phase, each sub-region classifier is decided whether to update based on the respective classifier scores in order to prevent the target model from being updated by occlusions. If the score of the classifier is less than zero, it indicates that the probability of the region being judged as belonging to a non-target object is relatively large. Therefore, the target model of the sub-region is not updated to retain the object information.

Figure 2 shows the distribution of sub-regions. Figure 2(a) shows all of the sub-regions. There are some overlapping regions between sub-regions. Figure 2(b) shows the lower right corner sub-region. The position of each sub-region is shown in Equation (1). Where x and y are the coordinates of the upper left corner of the region of interest, w_{s} and h_{s}_{ }are the width and height of the sub-regions

Figure 1. Flowchart of the proposed algorithm.

Figure 2. The distribution of sub-regions. (a) All sub-regions; (b) a single sub-region.

and T_{ij} is the coordinates of the upper left corner of the sub-regions.

${T}_{ij}=\left[x+i\times \frac{{w}_{s}}{2},y+j\times \frac{{h}_{s}}{2}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=0~2,j=0~2$ (1)

During the establishment of the target model stage, we use the method proposed in [5] to assign different weightings according to the importance of the positive samples. The target and background model is established as Equations ((2) and (4)). Where
$p\left(y=1|{V}^{+}\right)$ and
$p\left(y=0|{V}^{-}\right)$ are the target and background model.
$p\left(y=1|{v}_{1j}\right)$ is the posterior probability for sample v_{1j}. N is the number of positive samples and L is the number of negative samples. l is the location function and c is a normalization constant. w is a constant.

$p\left(y=1|{V}^{+}\right)={\displaystyle \underset{j=0}{\overset{N-1}{\sum}}{w}_{j0}p\left({y}_{1}=1|{v}_{1j}\right)}$ (2)

${w}_{j0}=\frac{1}{c}{\text{e}}^{-\left|l\left({v}_{1j}\right)-l\left(v{\text{\hspace{0.05em}}}_{10}\right)\right|}$ (3)

$p\left(y=1|{V}^{-}\right)={\displaystyle \underset{j=N}{\overset{N+L-1}{\sum}}wp\left({y}_{0}=0|{v}_{0j}\right)}$ (4)

In the proposed tracking algorithm, we use multiple sets of randomly-generated and different image features to track respectively. After each time the highest classifier scores for candidate positions are calculated, we select the optimal tracking result as the final target location. Because of the multiple sets of image features, there are additional opportunities to produce the better results than the original. Therefore, if the best candidate can be selected from the candidate images, it is possible to obtain a better tracking performance than the conventional one. In the proposed tracking algorithm, the optimal tracking result is determined by calculating the Bhattacharyya coefficient between the candidate image and the reference image. The Bhattacharyya coefficient is defined as Equation (5), where N is the total number of indices of the histogram. The target image and the candidate image model are proposed in [6] and shown in Equations ((6) and (8)). Where δ is the Kronecker delta function. C and C_{h} are the normalization constants. The large value of the Bhattacharyya coefficient indicates that the candidate image has high similarity with the target image. Therefore, after the end of each tracking we select the largest Bhattacharyya coefficient corresponding to the candidate image position as the tracking result.

$\rho \left[p,q\right]={\displaystyle {\sum}_{u=1}^{N}\sqrt{{p}^{\left(u\right)}{q}^{\left(u\right)}}}$ (5)

$\stackrel{^}{{q}_{u}}=C{\displaystyle \underset{i=1}{\overset{n}{\sum}}k\left({\Vert {x}_{i}^{\ast}\Vert}^{2}\right)\delta \left(b\left({x}_{i}^{\ast}\right)-u\right)}$ (6)

$C=\frac{1}{{\displaystyle {\sum}_{i=1}^{n}k\left({\Vert {x}_{i}^{\ast}\Vert}^{2}\right)}}$ (7)

$\stackrel{^}{{p}_{u}}\left(y\right)={C}_{h}{\displaystyle \underset{i=1}{\overset{{n}_{h}}{\sum}}k\left({\Vert \frac{y-{x}_{i}}{h}\Vert}^{2}\right)\delta \left(b\left({x}_{i}\right)-u\right)}$ (8)

${C}_{h}=\frac{1}{{\displaystyle {\sum}_{i=1}^{{n}_{h}}k\left({\Vert \frac{y-{x}_{i}}{h}\Vert}^{2}\right)}}$ (9)

The low-dimensional image features used in [3] have scale invariance. This paper also integrates the multi-scale tracking function into the proposed algorithm. In this paper, we also use the image features of large, invariant and small scale to track. To avoid changes in target size is very small and cannot be detected, we use an additional target model for scale detection and tracking. The second target model is updated less frequently and will not be updated until the end of every fifth frame. Slower update frequency is intended to preserve the target image information before the five frames. The detected image will be more different from the second target model and it is relatively easy to detect changes in the target size.

The proposed multi-scale tracking algorithm is showed in Figure 3. Multi-scale detection is performed at the end of the trace phase and executes once

Figure 3. The proposed multiscale tracking algorithm.

every five frames. If scale detection is required, the image features of different scales and the second target model are used to track again. If the highest classifier score is derived from a larger or smaller scale image feature, it represents a change in size of the target. Therefore, the target position is determined by the highest classifier score obtained in the final tracking. If the highest classifier score is derived from the invariant scale image features, the tracking result obtained before the scale detection is taken as the target position. Because the previous result is tracked with the first target model updated each frame, it is more accurate than the result tracked with the second target model.

In addition, the dramatic change in the size of the object is usually the case in the distance between the target and the observer has a huge change. The color of the target will change due to the influence of the medium between the target and the observer. Therefore, the target model used in the multi-scale detection needs to be updated to reduce the impact of target color changes. The updating method is proposed in [7] and shown in Equation (11). Where ${q}_{t}^{\left(u\right)}$ is the updated target model in frame t, λ is learning parameter and ${p}^{\left(u\right)}$ is the model of the final tracing results in frame t.

${q}_{t}^{\left(u\right)}=\left(1-\lambda \right){q}_{t-1}^{\left(u\right)}+\lambda {p}^{\left(u\right)}$ (10)

3. Experimental Results

The experiment parameters used in this paper are consistent. There are 10 weak classifiers in each sub-region. The learning parameter λ is set to 0.85. The scale change parameter δ is set to 0.1. Twenty experiments are tested for each testing video. Measurements are averaged over 20 experiments. Tables 1-3 show the results of the experiment with three metrics. It can be observed from the experimental data that the proposed two sets of feature tracking algorithm have a significant improvement, regardless of which metric is used.

Table 4 shows the results of the multi-scale tracking experiment. The left side is the result of tracking with single feature. The right side is the result of tracking with two features. It can be observed from the above table that in most of the test videos, the tracking results with two sets of features are better. There is significant progress in the difficult examples, such as testing video Bus, Car_silver and

Table 1. Center location error (CLE) (in pixels) of the single-scale tracking experiments.

Table 2. Bounding box oveerlap ratio (BBOR) (%) of the single-scale tracking experiments.

Table 3. Success rate (SR) (%) of the single-scale tracking experiments.

Table 4. The multi-scale tracking experiments.

Car_scooter. In the testing video Bus, the target has undergone the changes in the light and shadow caused by the bridge and the short-term occlusion caused by the scooter. In the testing video Car_silver, the small target and the large shadow area make the tracking difficulty greatly increased. The difficulty of the testing video Car_silver is from the long-term partial occlusion caused by the scooter. In the last two cases, the results with two features are worse than the results with single feature. Because we use the image color as a discriminant method, It is easy to select an erroneous candidate image when a similar color appears in the background. Testing video Freeway is an obvious example. The tracker drifts away because of the interference caused by the background color. It causes the classifier to update by the background image, so there are poor results.

Table 5 shows the speed of the proposed tracking algorithm. The experimental results show that the speed of the algorithm is related to the image size, and the larger the image size, the slower the speed. The proposed single scale tracking algorithm can achieve near real-time speed when the image size is not too large. In addition, the speed of the multi-scale tracking algorithm is not reduced to one third of the speed of the single scale algorithm.

4. Conclusion

The tracking algorithm proposed in this paper is mainly aimed at improving the tracking resulting in compressive tracking. We use a number of different sets of image features to track and produce better tracking performance by selecting the best tracking results. For the choice of tracking results, we experiment with the Bhattacharyya coefficient. According to the experimental results, using the Bhattacharyya coefficient to judge the results can produce very good experimental results. The color information of the object is usually no dramatic change. Therefore, most tracking such as occlusion, deformation, or similar background can be overcome by selecting best tracking result. Significant improvements can be seen in the three metrics used to measure performance. The

Table 5. Average frame per second (FPS) of the single-scale tracking experiments.

experimental results show that the proposed tracking algorithm can greatly reduce the tracking errors. The best performance improvements in terms of center location error, bounding box overlap ratio and success rate are from 63.62 pixels to 15.45 pixels, from 31.75% to 64.48% and from 38.51% to 82.58%, respectively. Moreover, when the image size is not large, it will be able to achieve real-time computing.

Acknowledgements

The authors would like to thank the Editor and the referee for their comments. This work was supported in part by the National Science Council, Taiwan, under Grant No.98-2221-E-009-138.

References

[1] Ross, D.A., Lim, J., Lin, R.S. and Yang, M.H. (2008) Incremental Learning for Robust Visual Tracking. International Journal of Computer Vision, 77, 125-141.

https://doi.org/10.1007/s11263-007-0075-7

[2] Kwon, J. and Lee, K.M. (2010) Visual Tracking Decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 1269-1276.

https://doi.org/10.1109/CVPR.2010.5539821

[3] Zhang, K., Zhang, L. and Yang, M.H. (2014) Fast Compressive Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 2002-2015.

https://doi.org/10.1109/TPAMI.2014.2315808

[4] Zhu, Q., Yan, J. and Deng, D. (2013) Compressive Tracking via Oversaturated Sub-Region Classifiers. IET Computer Vision, 7, 448-455.

https://doi.org/10.1049/iet-cvi.2012.0248

[5] Wang, W., Xu, Y., Wang, Y., Zhang, B. and Cao, Z. (2013) Effective Weighted Compressive Tracking. The 17th IEEE International Conference on Image and Graphics, 353-357.

https://doi.org/10.1109/ICIG.2013.77

[6] Comaniciu, D., Ramesh, V. and Meer, P. (2000) Real-Time Tracking of Non-Rigid Objects Using Mean Shift. IEEE Conference on Computer Vision and Pattern Recognition, 2, 142-149.

https://doi.org/10.1109/CVPR.2000.854761

[7] Babu, R.V., Perez, P. and Bouthemy, P. (2007) Robust Tracking with Motion Estimation and Local Kernel-Based Color Modeling. Image and Vision computing, 25, 1205-1216.

https://doi.org/10.1016/j.imavis.2006.07.016