IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013 1957 Robust Object Tracking via Active Feature Selection Kaihua Zhang, Lei Zhang, Member, IEEE, Ming-Hsuan Yang, Senior Member, IEEE, and Qinghua Hu, Member, IEEE Abstract —Adaptive tracking by detection has been widely studied with promising results. The key idea of such trackers is how to train an online discriminative classifier, which can well separate an object from its local background. The classifier is incrementally updated using positive and negative samples extracted from the current frame around the detected object location. However, if the detection is less accurate, the samples are likely to be less accurately extracted, thereby leading to visual drift. Recently, the multiple instance learning (MIL) based tracker has been proposed to solve these problems to some degree. It puts samples into the positive and negative bags, and then selects some features with an online boosting method via maximizing the bag likelihood function. Finally, the selected features are combined for classification. However, in MIL tracker the features are selected by a likelihood function, which can be less informative to tell the target from complex background. Motivated by the active learning method, in this paper we propose an active feature selection approach that is able to select more informative features than the MIL tracker by using the Fisher information criterion to measure the uncertainty of the classification model. More specifically, we propose an online boosting feature selection approach via optimizing the Fisher information criterion, which can yield more robust and efficient real-time object tracking performance. Experimental evaluations on challenging sequences demonstrate the efficiency, accuracy, and robustness of the proposed tracker in comparison with state- of-the-art trackers. Index Terms—Active learning, fisher information, multiple instance learning, visual tracking. I. Introduction V ISUAL tracking is a very active research topic in the field of computer vision because of its importance in many applications, such as vehicle navigation, traffic monitoring, and human–computer interaction [1]. Although object tracking has been studied for several decades and numerous algorithms have been proposed, it is still a very challenging problem Manuscript received December 9, 2012; revised April 19, 2013; accepted April 28, 2013. Date of publication June 18, 2013; date of current version November 1, 2013. This work was supported in part by the HKPU Internal Research Grant, the National Natural Science Foundation of China under Grant 61222210, NSF CAREER Grant 1149783, and NSF IIS Grant 1152576. This paper was recommended by Associate Editor Francesco G. B. De Natale. K. Zhang and L. Zhang are with the Department of Computing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, (e-mail: cskhzhang@comp.polyu.edu.hk; cslzhang@comp.polyu.edu.hk). M.-H. Yang is with the Department of Electrical Engineering and Com- puter Science, University of California, Merced, CA 95344 USA (e-mail: mhyang@ucmerced.edu). Q. Hu is with the School of Computer Science and Technology, Tianjin University, Tianjin 300072, China (e-mail: huqinghua@tju.edu.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2013.2269772 since the appearance of the target object can be drastically changed due to the factors such as illumination changes, pose variations, full or partial occlusions, abrupt motion, etc. Thus, how to design a robust appearance model that can adaptively handle the above factors over time is the key to develop a high-performance tracking system. Some appearance models are only designed to represent the object, while the other models consider both the object and its local background. The latter methods often perform better than the former ones because they often treat tracking as a binary classification problem, which separates object from its local background via a discriminative classifier. Considering that these methods are closely related to the object detection task, they are often referred to as tracking by detection. When training the classifier, the selection of positive and negative samples affects the performance of the tracker. Most trackers only choose one positive sample, i.e., the tracking result in the current frame. If the tracked target location is not accurate, the classifier will be updated based on a less effective positive sample, thereby leading to visual drift over time. To alleviate the drifting problem, multiple samples near the tracked target location can be used to train the classifier. However, the ambiguity occurs if the traditional supervised learning method is used to train the classifier [2]. Recently, a multiple instance learning (MIL) approach [2] was proposed to solve the ambiguity problem in tracking. The samples are put into bags and only the labels of the bags are provided. The bag is positive if one or more instances in it are positive while the bag is negative when all of the instances in it are negative. The samples near the tracking location are put into the positive bag while the samples far from the tracking location are put into the negative bag. Then, a classifier is designed by optimizing the bag likelihood func- tion. To handle the appearance variations over time, an online MIL boosting algorithm is proposed to greedily select the discriminative features from a feature pool by maximizing the bag likelihood function. Finally, the selected weak classifiers (each corresponds to a feature) are linearly combined to a strong classifier. The strong classifier is then used to separate object from background in the next frame. Empirical studies on some challenging sequences have shown that the MIL tracker can better handle visual drift than most state-of-the-art trackers [2]. Despite its success, the MIL tracker [2] has the following shortcomings. First, the selected features may be less infor- mative. In order to make the classifier discriminative enough, a relatively large number of features are selected from the 1051-8215 c 2013 IEEE