SURF-based Human Tracking Algorithm with
On-line Update of Object Model
Meenakshi Gupta
∗
Nishant Kejriwal
∗∗
Laxmidhar Behera
∗
K.S. Venkatesh
∗
∗
Department of Electrical Engineering, Indian Institute of Technology
Kanpur, Uttar Pradesh, India - 208016.
e-mail: meenu@iitk.ac.in, lbehera@iitk.ac.in, venkats@iitk.ac.in
∗∗
Innovation Lab, Tata Consultancy Services (TCS), Noida.
e-mail: nishant.kejriwal@tcs.com
Abstract: The ability to robustly track a human is an essential prerequisite to an increasing number of
applications that needs to interact with a human user. This paper presents a robust vision based algorithm
to track a human in a dynamic environment using interest point-based method. The tracking algorithm is
expected to cope with changes in pose, scale, illumination as well as camera motion. The interest point
based (e.g. SURF) tracking methods suffer from the limitation of unavailability of sufficient number of
matching key points for the target in all frames of a running video. One solution to this problem is to
have an object model which contains SURF features for all possible poses and scaling factors. So an
object model with all possible descriptors could be created off-line and could be used for detecting the
target in each and every frame. However, such a scheme can not be used for tracking an object online.
In order to overcome this problem, we propose a new approach which update the object model online
and have sufficient matching key points for the target in case of change in the pose as well as scaling.
Experimental results are provided to show the efficacy of the algorithm.
1. INTRODUCTION
Visual tracking of objects is one of the several capabilities that
human beings have. At the present time, introducing these capa-
bilities in the artificial visual systems is one of the most active
research challenges in computer vision and mobile robotics.
The field has witnessed an unprecedented advancement owing
to the availability of high quality cameras and inexpensive
computing power, commensurate with the development of in-
genuous techniques for image and video processing. In spite of
the advancement made in this field, the visual tracking is still
fraught with difficulties arising due to abrupt object motion,
appearance pattern change including pose, non-rigid object
structure, occlusion and camera motion [A. Yilmaz and Shah,
2006] [Yang et al., 2011]. In this paper, we focus on interest-
point based methods [Kloihofer and Kampel, 2010][Ta et al.,
2009][He et al., 2009] which use local features such as SIFT
[Lowe, 2004] or SURF [Bay et al., 2008] as the visual feature
for object tracking due to their robustness to photometric and
geometric distortions.
We specifically look into the problem of tracking a non-rigid
object (human) from a camera placed on the mobile platform
[Motai et al., 2012] [Gupta et al., 2011]. Most of the human-
following robots make use of multiple sensors in order to
track and follow a human as in [Hu et al., 2013] [Bellotto
and Hu, 2009] [Vadakkepat et al., 2008]. Vision-based human
detection and tracking is one of the most important module for
human-following robots as one can see in [Nagumo and Ohya,
2001] [Yoshimi et al., 2006] [Hirai and Mizoguchi, 2003].
The most popular vision based tracking algorithm is Mean-
shift. Its a local search algorithm based on colour histogram
matching [Comaniciu et al., 2000] and easy to implement.
However, the colour based tracking methods [Zhang et al.,
2011] are sensitive to variation in illumination condition and
necessitate having non-matching backgrounds [Gupta et al.,
2011]. This has prompted researchers to use histogram of some
other distinctive feature (such as SIFT, SURF) for Mean-shift
tracking [Ahmadi et al., 2012]. In [Garg and Kumar, 2013]
Sourav et al. proposed a object tracking algorithm that apply
Mean-shift directly on SURF features. They proposed a method
called re-projection to overcome the limitation of unavailability
of sufficient number of key points for a given object. However,
such an algorithm can not be used to track the non-rigid
object as it does not account for changes in pose of object
due to non-rigid motion or out-of-plane rotations. Meenakshi
et al. [Gupta et al., 2013] proposed a tracking algorithm that
uses a dynamic object model description to detect the target.
This dynamic object model derives its point from a template
pool which helps in reinforcing the features which occur more
frequently compared to others. In this way, they resolve the
stability-plasticity dilemma in object tracking [Gu et al., 2010]
without having to learn the actual motion model of the object
[Ta et al., 2009] [He et al., 2009] or creating bag-of-words
through clustering [Bing et al., 2010]. The dynamic object
model description proposed by them able to track the non-
rigid object in case of out-of-plane rotations but increases the
overall computational cost of the algorithm due to frame-to-
frame matching.
In this work, we have combined the SURF-based Mean-shift
algorithm and the dynamic object model description in such
a way that the algorithm can track a non-rigid object with
real time computational power. The human to be tracked is
selected in the first frame by manually drawing a polygon on
the boundary of the human silhouette. The bounding rectangle
of the polygon is used as the initial window for the mean-shift
tracker. The traget is located in the next frame by mean-shift
Third International Conference on
Advances in Control and Optimization of Dynamical Systems
March 13-15, 2014. Kanpur, India
978-3-902823-60-1 © 2014 IFAC 321 10.3182/20140313-3-IN-3024.00247