IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987 Finding Trajectores of Feature Points in a Monocular Image Sequence ISHWAR K. SETHI, MEMBER, IEEE, AND RAMESH JAIN, SENIOR MEMBER, IEEE Abstract-Identifying the same physical point in more than one im- age, the correspondence problem, is vital in motion analysis. Most re- search for establishing correspondence uses only two frames of a se- quence to solve this problem. By using a sequence of frames, it is possible to exploit the fact that due to inertia the motion of an object cannot change instantaneously. By using smoothness of motion, it is possible to solve the correspondence problem for arbitrary motion of several nonrigid objects in a scene. We formulate the correspondence problem as an optimization problem and propose an iterative algo- rithm to find trajectories of points in a monocular image sequence. A modified form of this algorithm is useful in case of occlusion also. We demonstrate the efficacy of this approach considering synthetic, labo- ratory, and real scenes. Index Terms-Correspondence, motion object tracking, path coher- ence, smoothness of motion, structure from motion. 1. INTRODUCTION T HE last few years have seen increasing interest in dy- namic scene analysis. The input to a dynamic scene analysis system is a sequence of images. As is well known, an image represents a 2-D projection of a 3-D scene at a time instant. A major problem in a computer vision system is to recover the information about objects in a scene from images. This problem cannot be solved without some assumptions about the world. A sequence of frames allows one additional dimension to recover the information about the 3-D world that is lost in the projec- tion process. Multiple views of a moving object acquired using a stationary camera may allow recovery of the structure of the object [4], [36], [32]-[34], [31], [24]. A mobile camera may be used to acquire information about the structure of the stationary objects in a scene using op- tical flow [6], [22], axial motion stereo [20], and other methods [16], [14]. Many researchers in psychology of vision support the recovery of information from image sequences, rather than an image, representing a scene [19], [6], [16]. Gibson [16] argued in support of active information pickup by the observer in an environment. Johansson [16] demonstrated the efficacy of only motion information in recognition of Manuscript received April 18, 1985; revised February 6, 1986. Rec- ommended for acceptance by W. B. Thompson. This work was supported in part by the NSF under Grant DCR-8500717. I. K. Sethi is with the Department of Computer Science, Wayne State University, Detroit, MI 48202. R. Jain is with the Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI 48109. IEEE Log Number 8609395. objects using moving light displays. Neisser [19] pro- posed a model according to which the perceptual pro- cesses continually interact with the incoming information to verify anticipations formed on the basis of available information until a given time instant. In computer vision systems, the efficacy of even noise-sensitive approaches, such as difference and accumulative difference pictures, was demonstrated by using hypothesize-and-test mecha- nisms to analyze complex real-world scenes [13]. Al- though many researchers are addressing the problem of recovering information in dynamic scenes, it appears that due to the legacy of static scenes most researchers are approaching the recovery problem using just two or three frames of a sequence. This self-imposed restriction re- sults in approaches suitable for quasi-dynamic scene anal- ysis, rather than dynamic scene analysis. Since the infor- mation recovery process requires constraints about the scene, the analysis based on a minimal number of frames rests on assumptions that ignore the most important in- formation in dynamic scenes-the motion of objects. Structurefrom motion has attracted significant research efforts recently from researchers working in the field of dynamic scene analysis [27], [31], [32], [34], [36], [38]. Ullman popularized the rigidity assumption in computer vision. This assumption states that any set of elements undergoing a 2-D transformation which has a unique interpretation as a rigid body moving in space should be so interpreted. The rigidity assumption allows recovery of the structure of objects, under certain conditions, in three frames. Another popular approach for the recovery of the struc- ture and motion is to use optical flow fields [16], [22], [14], [18], [26], while others try to recover the same in- formation using points in frames. The optical flow is the field of retinal velocities. In computer vision, it is consid- ered the velocity field for all image points. It has been shown that the optical flow contains information about the motion of the observer and the environment. Approaches for the computation [10] and for the recovery of structure [37] have been proposed. Considering the difficulties in computing optical flow of acceptable quality, some efforts are being made to recover the structure using the charac- teristics of optical flow, but without computing it [11], [20]. Recently, the trajectory-based recovery has attracted some attention [31], [281, [38], [24], [151. It has been 0162-8828/87/0100-0056$01.00 © 1987 IEEE 56