3DSVHT: Extraction of 3D Linear Motion via Multi-view, Temporal Evidence Accumulation J.A.R. Artolaz´ abal and J. Illingworth Center for Vision, Speech and Signal Processing, University of Surrey, Guildford, UK {j.artolazabal, j.illingworth}@surrey.ac.uk Abstract. Shape recognition and motion estimation are two of the most diﬃcult problems in computer vision, especially for arbitrary shapes un- dergoing severe occlusion. Much work has concentrated on tracking over short temporal scales and the analysis of 2D image-plane motion from a single camera. In contrast, in this paper we consider the global analysis of extended stereo image sequences and the extraction of speciﬁed objects undergoing linear motion in full 3D. We present a novel Hough Trans- form based algorithm that exploits both stereo geometry constraints and the invariance properties of the cross-ratio to accumulate evidence for a speciﬁed shape undergoing 3D linear motion (constant velocity or oth- erwise). The method signiﬁcantly extends some of the ideas originally developed in the Velocity Hough Transform, VHT, where detection was limited to 2D image motion models. We call our method the 3D Stereo Velocity Hough Transform, 3DSVHT. We demonstrate 3DSVHT on both synthetic and real imagery and show that it is capable of detecting ob- jects undergoing linear motion with large depth variation and in image sequences where there is signiﬁcant object occlusion. 1 Introduction Object recognition and motion estimation form two major areas of computer vision. Many methods have been developed to solve each of these problems in isolation but there has been less work on approaches that attempt to address both problems simultaneously. Object recognition via shape detection has been fairly successfully attempted using the Hough Transform[2], HT, and its variants, especially the Generalised Hough Transform[1], GHT. However, it is only fairly recently, in the Velocity Hough Transform[4], VHT, that the method has been extended to detect objects that simultaneously satisfy both a 2D shape model and a 2D image-motion model. The VHT clearly demonstrated the beneﬁt of using both structural and temporal information simultaneously. A signiﬁcant limitation of the VHT method is that motion is only modeled in the 2D image plane. However, when an object travels in 3D then its perspective projection onto the image plane is a non-linear function of depth. This means that a uniform velocity linear motion in 3D does not project to a constant velocity 2D motion on the image plane. Hence, the VHT can fail in situations J. Blanc-Talon et al. (Eds.): ACIVS 2005, LNCS 3708, pp. 563–570, 2005. c  Springer-Verlag Berlin Heidelberg 2005