WAMI OBJECT TRACKING USING L 1 TRACKER INTEGRATED WITH A DEEP DETECTOR Erdem Onur Ozyurt Bilge Gunsel Multimedia Signal Proc. and Pattern Recognition Lab. Istanbul Technical University, Turkey {ozyurter,gunselb} itu.edu.tr ABSTRACT We propose an object tracking method for Wide Area Mo- tion Imagery (WAMI) video sequences, which models the tracking as a regularization problem through sparse represen- tation of aerial video content. The proposed object tracker, L1Dpct, applies particle ﬁlter tracking, and unlike the existing methods, it integrates a deep-learning-based object detector into the regularization scheme to improve the tracking per- formance. In order to enhance robustness to occlusion and scale changes, L1Dpct monitors the state propagation, the level of sparsity as well as the representation capability of the model and receives feedback from the detector to update the observation model of the particle ﬁlter. L1Dpct incremen- tally updates the dictionary of the sparse representation that enables us to efﬁciently represent the appearance changes of the object arising from illumination changes and high motion. Numerical results obtained on commonly used VIVID and UAV123 datasets denote that L1Dpct signiﬁcantly improves the object tracking performance in terms of precision rate and success rate compared to the state-of-the-art trackers. Index Terms— Sparse object tracking, particle ﬁlter 1. INTRODUCTION Moving object tracking aims to continuously predict the state of a target through video sequence. Object tracking in WAMI video sequences come up with challenges due to recording in low resolution, high motion and occlusion as well as the aerial platform where the sensor or camera is located [1]. Recently, integration of generative and discriminative ob- ject tracking method combined with sparse representation is arousing interest because of their robust performances in ob- ject tracking. The promising performance of sparse represen- tation in face recognition [2] has led creation of a dictionary to encode the candidates propagated based on a generative sparse representation of templates [3]. The accelerated proxi- mal gradient approach is developed to realize the real time l 1 tracker with the same dictionary learning strategy [4]. Also, a sparsity-based collaborative model is introduced in [5]. Also, a method that uses sparse codes, discriminative dictionary and nonlinear classiﬁer jointly has been developed in [6]. We propose an object tracking method for WAMI video, which models tracking as a regularization problem through sparse representation of aerial video content. It beneﬁts from the method proposed in [4] which is set up based on the inspir- ing l 1 tracker which uses regularization through sparse repre- sentation and l 1 minimization in collaboration with particle ﬁltering and minimal error bound for speeding up the process as well as the accelerated proximal gradient approach that is a fast numerical solver. The proposed tracker, L1Dpct, ap- plies particle ﬁlter tracking, and unlike the existing methods, it integrates a deep learning based object detector into regular- ization to improve tracking performance. The faster R-CNN is chosen as the object detector because its integration into the variable-rate color particle ﬁltering highly improves the track- ing performance for CCD videos [7]. The study [8] proposes an occlusion detection scheme and the study [4] controls triv- ial energy in l 1 minimization in order to enhance robustness to occlusion and scale changes. However, these contributions do not lead a successful tracking in WAMI video. L1Dpct improves the performance through target template set update by monitoring state propagation and trivial energy and receiv- ing feedback from the deep detector. L1Dpct completely up- dates the dictionary of sparse representation by regenerating the atoms based on region of interest received from the de- tector that enable us to efﬁciently represent the appearance changes of the object arising from occlusion. The experi- ments on VIVID [9] and UAV123 [10] datasets demonstrate the superior performance of the proposed method in compar- ison with popular trackers. 2. THE BASELINE TRACKER The proposed object tracker is developed by taking the L1APG tracker introduced in [4] as the baseline model. In the following, the common formulation is given. In our tracking scheme, a target object which refers to a rectangular region of interest (RoI t ) is selected. The template dictionary, T t ∈ R d×n , which consists of d -dimensional target templates is initialized by collecting n downsampled patches cropped within a one-pixel neighborhood of the RoI t . To handle oc- clusion, a trivial template set is also inserted in the process [2]. The trivial template set U consists of d trivial templates which are also generated in R d . Attributes of R d are simply vectors with a single non-zero entry thus, U = I ∈ R d×d . The candidate region of interest corresponding to k th par- ticle, RoI k t , k = 1, 2, ··· , N is obtained by sampling using state