MOTION BASED CORRESPONDENCE FOR 3D TRACKING OF MULTIPLE DIM OBJECTS Ashok Veeraraghavan 1 , Mandyam Srinivasan 2 , Rama Chellappa 1 , Emily Baird 2 and Richard Lamont 2 1 Department of Electrical and Computer Engg. 2 Research School of Biological Sciences University of Maryland Australian National University College Park MD-20742, USA Canberra ACT 2601, Australia {vashok,rama}@umiacs.umd.edu {M.Srinivasan,emily.baird,richard.lamont}@anu.edu.au ABSTRACT Tracking multiple objects in a video is a demanding task that is frequently encountered in several systems such as surveil- lance and motion analysis. Ability to track objects in 3D requires the use of multiple cameras. While tracking multi- ple objects using multiples video cameras, establishing corre- spondence between objects in the various cameras is a non- trivial task. Specifically, when the targets are dim or are very far away from the camera, appearance cannot be used in order to establish this correspondence. Here, we propose a tech- nique to establish correspondence across cameras using the motion features extracted from the targets, even when the rel- ative position of the cameras is unknown. Experimental re- sults are provided for the problem of tracking multiple bees in natural flight using two cameras. The reconstructed 3D flight paths of the bees show some interesting flight patterns. 1. INTRODUCTION Tracking objects using multiple cameras has the obvious ad- vantages of 3D reconstruction of tracks and wider field of view. Moreover, when the cameras are sufficiently far apart objects that are occluded in one camera might still be visible in the other cameras. But the use of multiple cameras requires establishing correspondence across objects seen in the various views. When there is only one object in view then this corre- spondence is easily established [1]. But while handling mul- tiple targets establishing this correspondence is a non-trivial task. Moreover, if the cameras are sufficiently separated then the appearance of the same target in the different cameras will be very different and therefore cannot be used as a cue for es- tablishing correspondence. Also, when the targets are dim (very low signal to noise ratio) or are very far away from the camera (and therefore occupy very few pixels on the image), then appearance features cannot be used for establishing cor- respondence. Moreover, if the targets themselves resemble each other in appearance, as in the case of tracking several bees, then using appearance information could be ineffective. Therefore, one needs to develop alternate strategies for estab- lishing this correspondence. This work was partially supported by the NSF-ITR Grant 0325119. Motion information that is implicit in the individual tracks obtained in the various views is an obvious candidate. But the tracks in the various camera views are perspective pro- jections of true 3D tracks and therefore additional constraints are necessary to match tracks. There have been several at- tempts to use auxiliary information about motion to constrain the matching process. [2] uses the constraint that the motion of the feet of tracked people lies on the ground plane to re- cover extrinsic camera parameters and then to align and match tracks obtained in the two views. [3] computes the field of view of one camera on the field of view of the other cameras, again by assuming the presence of a ground plane on which subjects walk, to obtain correspondence across views. In our approach we use a theorem concerning the projection of 3-D trajectories of a moving object on to a 2-D image stated origi- nally in [4] and then later again in [5], to establish correspon- dence between motion trajectories in the various cameras. 2. OVERVIEW OF THE APPROACH Images from the different cameras are initially considered separately. The dynamic background is obtained for each video sequence during each frame by assuming that the back- ground variations are much slower than the motion of the targets. The background subtracted frames are thresholded to obtain a binary foreground mask. Connected component analysis is performed on the binary foreground mask to ob- tain a set of blobs representing the hypothesised position of the several targets in each frame. A simple blob tracking model based on the constant velocity model is used to track the motion of the targets in the video. Thus there are sev- eral long tracks of targets available for each camera view. We establish correspondence between the various bee tracks in the different camera views by exploiting the properties of the spatio-temporal curvature of these tracks. We note that estab- lishing this correspondence does not require one to know the exact relative position of the cameras. Once correspondence between tracks is established, we can infer the relative posi- tion of the cameras using these correspondences. We then re- construct the 3-D trajectories of the targets using the standard triangulation algorithm. Therefore, the algorithm is distrib-