IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 1, JANUARY 1987
Finding Trajectores of Feature Points in a Monocular
Image Sequence
ISHWAR K. SETHI, MEMBER, IEEE, AND RAMESH JAIN, SENIOR MEMBER, IEEE
Abstract-Identifying the same physical point in more than one im-
age, the correspondence problem, is vital in motion analysis. Most re-
search for establishing correspondence uses only two frames of a se-
quence to solve this problem. By using a sequence of frames, it is
possible to exploit the fact that due to inertia the motion of an object
cannot change instantaneously. By using smoothness of motion, it is
possible to solve the correspondence problem for arbitrary motion of
several nonrigid objects in a scene. We formulate the correspondence
problem as an optimization problem and propose an iterative algo-
rithm to find trajectories of points in a monocular image sequence. A
modified form of this algorithm is useful in case of occlusion also. We
demonstrate the efficacy of this approach considering synthetic, labo-
ratory, and real scenes.
Index Terms-Correspondence, motion object tracking, path coher-
ence, smoothness of motion, structure from motion.
1. INTRODUCTION
T HE last few years have seen increasing interest in dy-
namic scene analysis. The input to a dynamic scene
analysis system is a sequence of images. As is well
known, an image represents a 2-D projection of a 3-D
scene at a time instant. A major problem in a computer
vision system is to recover the information about objects
in a scene from images. This problem cannot be solved
without some assumptions about the world. A sequence
of frames allows one additional dimension to recover the
information about the 3-D world that is lost in the projec-
tion process. Multiple views of a moving object acquired
using a stationary camera may allow recovery of the
structure of the object [4], [36], [32]-[34], [31], [24]. A
mobile camera may be used to acquire information about
the structure of the stationary objects in a scene using op-
tical flow [6], [22], axial motion stereo [20], and other
methods [16], [14].
Many researchers in psychology of vision support the
recovery of information from image sequences, rather than
an image, representing a scene [19], [6], [16]. Gibson
[16] argued in support of active information pickup by the
observer in an environment. Johansson [16] demonstrated
the efficacy of only motion information in recognition of
Manuscript received April 18, 1985; revised February 6, 1986. Rec-
ommended for acceptance by W. B. Thompson. This work was supported
in part by the NSF under Grant DCR-8500717.
I. K. Sethi is with the Department of Computer Science, Wayne State
University, Detroit, MI 48202.
R. Jain is with the Department of Electrical Engineering and Computer
Science, The University of Michigan, Ann Arbor, MI 48109.
IEEE Log Number 8609395.
objects using moving light displays. Neisser [19] pro-
posed a model according to which the perceptual pro-
cesses continually interact with the incoming information
to verify anticipations formed on the basis of available
information until a given time instant. In computer vision
systems, the efficacy of even noise-sensitive approaches,
such as difference and accumulative difference pictures,
was demonstrated by using hypothesize-and-test mecha-
nisms to analyze complex real-world scenes [13]. Al-
though many researchers are addressing the problem of
recovering information in dynamic scenes, it appears that
due to the legacy of static scenes most researchers are
approaching the recovery problem using just two or three
frames of a sequence. This self-imposed restriction re-
sults in approaches suitable for quasi-dynamic scene anal-
ysis, rather than dynamic scene analysis. Since the infor-
mation recovery process requires constraints about the
scene, the analysis based on a minimal number of frames
rests on assumptions that ignore the most important in-
formation in dynamic scenes-the motion of objects.
Structurefrom motion has attracted significant research
efforts recently from researchers working in the field of
dynamic scene analysis [27], [31], [32], [34], [36], [38].
Ullman popularized the rigidity assumption in computer
vision. This assumption states that any set of elements
undergoing a 2-D transformation which has a unique
interpretation as a rigid body moving in space should be
so interpreted. The rigidity assumption allows recovery
of the structure of objects, under certain conditions, in
three frames.
Another popular approach for the recovery of the struc-
ture and motion is to use optical flow fields [16], [22],
[14], [18], [26], while others try to recover the same in-
formation using points in frames. The optical flow is the
field of retinal velocities. In computer vision, it is consid-
ered the velocity field for all image points. It has been
shown that the optical flow contains information about the
motion of the observer and the environment. Approaches
for the computation [10] and for the recovery of structure
[37] have been proposed. Considering the difficulties in
computing optical flow of acceptable quality, some efforts
are being made to recover the structure using the charac-
teristics of optical flow, but without computing it [11],
[20].
Recently, the trajectory-based recovery has attracted
some attention [31], [281, [38], [24], [151. It has been
0162-8828/87/0100-0056$01.00
© 1987 IEEE
56