SURF-based Human Tracking Algorithm with On-line Update of Object Model Meenakshi Gupta Nishant Kejriwal ∗∗ Laxmidhar Behera K.S. Venkatesh Department of Electrical Engineering, Indian Institute of Technology Kanpur, Uttar Pradesh, India - 208016. e-mail: meenu@iitk.ac.in, lbehera@iitk.ac.in, venkats@iitk.ac.in ∗∗ Innovation Lab, Tata Consultancy Services (TCS), Noida. e-mail: nishant.kejriwal@tcs.com Abstract: The ability to robustly track a human is an essential prerequisite to an increasing number of applications that needs to interact with a human user. This paper presents a robust vision based algorithm to track a human in a dynamic environment using interest point-based method. The tracking algorithm is expected to cope with changes in pose, scale, illumination as well as camera motion. The interest point based (e.g. SURF) tracking methods suffer from the limitation of unavailability of sufficient number of matching key points for the target in all frames of a running video. One solution to this problem is to have an object model which contains SURF features for all possible poses and scaling factors. So an object model with all possible descriptors could be created off-line and could be used for detecting the target in each and every frame. However, such a scheme can not be used for tracking an object online. In order to overcome this problem, we propose a new approach which update the object model online and have sufficient matching key points for the target in case of change in the pose as well as scaling. Experimental results are provided to show the efficacy of the algorithm. 1. INTRODUCTION Visual tracking of objects is one of the several capabilities that human beings have. At the present time, introducing these capa- bilities in the artificial visual systems is one of the most active research challenges in computer vision and mobile robotics. The field has witnessed an unprecedented advancement owing to the availability of high quality cameras and inexpensive computing power, commensurate with the development of in- genuous techniques for image and video processing. In spite of the advancement made in this field, the visual tracking is still fraught with difficulties arising due to abrupt object motion, appearance pattern change including pose, non-rigid object structure, occlusion and camera motion [A. Yilmaz and Shah, 2006] [Yang et al., 2011]. In this paper, we focus on interest- point based methods [Kloihofer and Kampel, 2010][Ta et al., 2009][He et al., 2009] which use local features such as SIFT [Lowe, 2004] or SURF [Bay et al., 2008] as the visual feature for object tracking due to their robustness to photometric and geometric distortions. We specifically look into the problem of tracking a non-rigid object (human) from a camera placed on the mobile platform [Motai et al., 2012] [Gupta et al., 2011]. Most of the human- following robots make use of multiple sensors in order to track and follow a human as in [Hu et al., 2013] [Bellotto and Hu, 2009] [Vadakkepat et al., 2008]. Vision-based human detection and tracking is one of the most important module for human-following robots as one can see in [Nagumo and Ohya, 2001] [Yoshimi et al., 2006] [Hirai and Mizoguchi, 2003]. The most popular vision based tracking algorithm is Mean- shift. Its a local search algorithm based on colour histogram matching [Comaniciu et al., 2000] and easy to implement. However, the colour based tracking methods [Zhang et al., 2011] are sensitive to variation in illumination condition and necessitate having non-matching backgrounds [Gupta et al., 2011]. This has prompted researchers to use histogram of some other distinctive feature (such as SIFT, SURF) for Mean-shift tracking [Ahmadi et al., 2012]. In [Garg and Kumar, 2013] Sourav et al. proposed a object tracking algorithm that apply Mean-shift directly on SURF features. They proposed a method called re-projection to overcome the limitation of unavailability of sufficient number of key points for a given object. However, such an algorithm can not be used to track the non-rigid object as it does not account for changes in pose of object due to non-rigid motion or out-of-plane rotations. Meenakshi et al. [Gupta et al., 2013] proposed a tracking algorithm that uses a dynamic object model description to detect the target. This dynamic object model derives its point from a template pool which helps in reinforcing the features which occur more frequently compared to others. In this way, they resolve the stability-plasticity dilemma in object tracking [Gu et al., 2010] without having to learn the actual motion model of the object [Ta et al., 2009] [He et al., 2009] or creating bag-of-words through clustering [Bing et al., 2010]. The dynamic object model description proposed by them able to track the non- rigid object in case of out-of-plane rotations but increases the overall computational cost of the algorithm due to frame-to- frame matching. In this work, we have combined the SURF-based Mean-shift algorithm and the dynamic object model description in such a way that the algorithm can track a non-rigid object with real time computational power. The human to be tracked is selected in the first frame by manually drawing a polygon on the boundary of the human silhouette. The bounding rectangle of the polygon is used as the initial window for the mean-shift tracker. The traget is located in the next frame by mean-shift Third International Conference on Advances in Control and Optimization of Dynamical Systems March 13-15, 2014. Kanpur, India 978-3-902823-60-1 © 2014 IFAC 321 10.3182/20140313-3-IN-3024.00247