Robust Visual Tracking via Rank-Constrained Sparse Learning Behzad Bozorgtabar 1 1 Vision & Sensing, HCC Lab, ESTeM University of Canberra Email: Behzad.Bozorgtabar@canberra.edu.au Roland Goecke 1,2 2 IHCC, RSCS, CECS Australian National University Email: roland.goecke@ieee.org Abstract—In this paper, we present an improved low-rank sparse learning method for particle ﬁlter based visual tracking, which we denote as rank-constrained sparse learning. Since each particle can be sparsely represented by a linear combination of the bases from an adaptive dictionary, we exploit the underlying structure between particles by constraining the rank of particle sparse representations jointly over the adaptive dictionary. Be- sides utilising a common structure among particles, the proposed tracker also suggests the most discriminative features for particle representations using an additional feature selection module employed in the proposed objective function. Furthermore, we present an efﬁcient way to solve this learning problem by connecting the low-rank structure extracted from particles to a simpler learning problem in the devised discriminative subspace. The suggested way improves the overall computational complexity for the high-dimensional particle candidates. Finally, in order to achieve a more robust tracker, we augment the sparse representa- tion of particles with adaptive weights, which indicate similarity between candidates and the dictionary templates. The proposed approach is extensively evaluated on the VOT 2013 visual tracking evaluation platform including 16 challenging sequences. Experimental results compared to state-of-the-art methods show the robustness and effectiveness of the proposed tracker. I. I NTRODUCTION Object tracking is a vital problem in computer vision with applications in a wide range of domains. Meanwhile, it has remained one of the most challenging vision tasks, due to sev- eral circumstances such as illumination changes, background clutter, heavy occlusion, in-plane and out-of-plane rotations and abrupt motion. In order to alleviate these challenges, we need a discriminative object representation to distinguish the target from background clutters. In addition, we seek target objects with the most similar appearance between consecutive frames under appearance changes. Sparse representation has recently been implemented for visual tracking [1], [2], in which a tracking candidate can be sparsely represented as a linear combination of the dictionary templates. In [1], learning the representation of each particle is considered individually which performs computationally expensive l 1 minimisation at each frame. Since in the particle ﬁlter-based tracking framework particles are randomly gener- ated around the current state of the target according to the Gaussian distribution, there is a common underlying structure between particles where each particle represents dependencies with other particles. In this paper, we propose a computa- tionally efﬁcient low-rank sparse learning approach for visual tracking in a particle ﬁlter framework, in which we exploit the intrinsic relationship between particles by constraining the rank of particle representation over the dictionary bases. In addition, we further extend our designed objective function with an l p,q mixed norm to extract the most discriminative features. Finally, adaptive weights are devised based on the similarity between target candidates and the dictionary templates improve the tracker accuracy and the robustness of our tracker in the case of potential instability. II. RELATED WORK In general, object tracking methods can be categorised as either generative or discriminative. Here, we mention some important trackers from both groups. A. Generative Trackers Generative methods use an appearance model to represent the target observations. Examples of generative methods are fragment-based tracker (Frag) [3], incremental tracker (IVT) [4], VTD tracker [5], mean shift tracker [6] and eigentracker [7]. Adam et al. [3] utilised multiple fragments to design an appearance model robust to partial occlusions. Ross et al. [4] introduced an adaptive linear subspace for the tracked object representation. The VTD tracker [5] effectively extends the conventional particle ﬁlter framework with multiple motion to account for appearance variation caused by changes in pose and lighting. B. Discriminative Trackers Discriminative models describe tracking as a binary classi- ﬁcation task to distinguish the foreground object from its sur- rounding background. Examples of discriminative methods are online multiple instance learning tracking [8], on-line boosting (OAB) [9], ensemble tracking [10], co-training tracking [11], adaptive metric differential tracking [12]. However, all of the above mentioned discriminative methods only utilise one positive sample. If the object location detected by the current classiﬁer is not precise, the extracted positive sample will be imprecise and this inaccuracy will be accumulated to degrade the classiﬁer. C. Object Tracking via Sparse Representation Due to the success of the sparse representation in face recognition, it has attracted considerable interest in object tracking. In [1], a tracking candidate is represented as a