Spatiograms Versus Histograms for Region-Based Tracking Stanley T. Birchfield Sriram Rangarajan Electrical and Computer Engineering Department Clemson University, Clemson, SC 29634 {stb, srangar}@clemson.edu Abstract We introduce the concept of a spatiogram, which is a gen- eralization of a histogram that includes potentially higher order moments. A histogram is a zeroth-order spatiogram, while second-order spatiograms contain spatial means and covariances for each histogram bin. This spatial informa- tion still allows quite general transformations, as in a his- togram, but captures a richer description of the target to increase robustness in tracking. We show how to use spa- tiograms in kernel-based trackers, deriving a mean shift procedure in which individual pixels vote not only for the amount of shift but also for its direction. Experiments show improved tracking results compared with histograms, using both mean shift and exhaustive local search. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) San Diego, California, June 2005 c 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. 1 Introduction Histograms have proved themselves to be a powerful rep- resentation for the image data in a region. Discarding all spatial information, they are the foundation of classic tech- niques such as histogram equalization and image indexing [9]. Building upon these concepts, several successful track- ing systems have been developed over the last several years using color histograms, taking advantage of their robustness to changing object pose and shape [1, 8, 10, 3, 2, 4, 11]. Other tracking systems have traditionally adopted a com- pletely different point of view. Representing an image re- gion by a template window of pixel intensities, the window is registered with the previous frame of the sequence to de- termine the displacement of the object [7, 5]. Such an ap- proach lies at the opposite end of the spectrum from his- tograms, because the spatial arrangement of the pixels in the window is explicitly expected not to deviate from a low- order parametric motion model. Recently, Hager et al. [4] developed a connection be- tween these two seemingly unrelated techniques by propos- ing to use multiple spatially-weighted histograms. The mathematical mechanism for enabling this connection is the mean shift algorithm, which is a kernel-based method for determining the alignment between two probability distri- butions. Mean shift has recently gained significant atten- tion as an efficient and robust method for visual tracking [3, 2, 11]. In this paper we consider the concept of a single his- togram in which each bin is spatially weighted by the mean and covariance of the locations of the pixels that contribute to that bin. We call this concept a spatial histogram, or spa- tiogram. We show that spatiograms are simply histograms with higher-order moments, and that histograms are zeroth- order spatiograms. Spatiograms are a richer representation, capturing not only the values of the pixels but their spa- tial relationships as well. We derive a mean shift procedure for spatiograms and demonstrate improved tracking results when compared with traditional histograms on an image se- quence of a person’s head. 2 Histograms and spatiograms Given a discrete function f : x → v, where x ∈X and v ∈V ,a histogram of f captures the number of occur- rences of each element in the range of f . More specifically, the histogram is h f : v →Z ∗ , where v ∈V and Z ∗ is the set of non-negative integers, and h f (v) is the number of elements x ∈X such that f (x)= v. Another way to look at h f is as the marginal of a binary function g f (x,v) over x, where g f (x,v)=1 if f (x)= v and 0 otherwise. That is, h f (v)= ∑ x∈X g f (x,v) is the zeroth-order moment of g along the v dimension. Histograms are important because they discard all information about the domain, thus mak- ing them invariant to any one-to-one transformation of the domain of the original function. A limited amount of information regarding the domain may be retained by using higher-order moments of the bi- nary function g, where the ith-order moment is given by h (i) f (v)= ∑ x∈X x i g f (x,v). We use the term spatial his- togram, or spatiogram, to refer to this concept, because it captures not only occurrence information about the range of the function, as in a histogram, but also information about the (spatial) domain. We define the kth-order spa- tiogram to be a tuple of all the moments up to order k: 〈h (0) f (v),...,h (k) f (v)〉. A histogram, then, is just a zeroth-