Three-Dimensional Vehicle Pose Estimation from Two-Dimensional Monocular Camera Images for Vehicle Classification U.U.SHEIKH, S.A.R. ABU-BAKAR Computer Vision, Video & Image Processing Lab, Dep. of Microelectronics and Computer Engineering, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, MALAYSIA Abstract: - In this paper, a new method is proposed to estimate the pose of a moving vehicle in a typical traffic scene. Pose determination is crucial in the process of fitting or matching an existing 3D model in the database with the captured moving object. In this work, pose estimation is determined by estimating the 3D position of the moving vehicle in world space and then computing the motion vector of the vehicle. The pose estimation is first initialized by loosely calibrating the camera viewport and the perspective distortion. The 3D position is then determined by computing the intersection of a ray trace of the vehicle’s centroid obtained from the video image originating from the camera eye to the vehicle’s motion plane, i.e. the ground. Once a motion vector is obtained, the 3D model is aligned to match the vehicle’s pose. The computation is performed on a 3D graphics c ard. Results on real-world traffic scenes as well as synthetic data are presented and several issues are outlined. Key-Words: - Vehicle Pose Detection, 3D Pose Estimation, Model Matching 1 Introduction Vehicle classification is one active area in intelligent transportation systems. It is important for traffic management and for obtaining traffic parameters. By classifying the vehicles into correct groups such as truck, car, motorcycle, traffic parameters such as flow and congestion can be obtained easily. There are several different methods used to classify vehicles, including techniques based on spatial information and 3D model fitting techniques. One of the earliest researches on vehicle classification was done by A.D Houghton et al [1]. The method proposed used a simple vehicle outline template matching. Over the years, more advanced methods were proposed by researchers. Works on vehicle detection and recognition using image processing can be divided into several categories, based on the methods used. Researches based on stereo vision are such as by M. J. J. Burden et al. [2], T. Aizawa [3] and M. Kimachi [4]. Works which utilizes monocular vision and coupled with statistic analysis on 2D images include [5] which utilize the objects dimension to determine object type. Other parameters used are like Fourier descriptors [6], size and linearity [7], compactness and aspect ratio [8], vehicle edge information at selected points [9], and feature training using neural networks [10]. There are several works on model based or 3D based vehicle recognition. Among them are from [11] that use parameterized models applied using principal component analysis on video sequences. Wei Wu et al. [12] used models to train neural network for classification. In [13], a fixed camera position was used (35 degrees looking angle and camera height of greater than 7m) with combination of polyhedral models of 8 types which resulted in an accuracy of more than 90%. Other similar model based approaches are such as in [14, 15, 16] and [17]. The method of pose detection used in [11] is based on minimizing the measure of a reference model and the trained model using least squares approximation. The reference model is scaled and translated to fit the image. The traffic scene is limited to one direction only. In [12], the pose parameters , , is first determined by the user manually to provide initial approximation. The pose is not calculated automatically, and the pose selected by the user is a fixed pose for the whole scene. This is only suitable for traffic conditions where turning does not occur, such as on highways. The works in [13] does not use any pose detection, although it is model based. Instead, the system proposed uses pre-computed models saved in the database. The moving object is then matched to the models in the database. For a typical junction, the system keeps over 2.2 million models. Although a large number of models are available, an organization method is proposed for fast model searching. On top of that, intrinsic camera parameters are required. T.N. Tan et al. [14], proposed a generalized Hough transform with explicit probability-based voting models to identify vehicle pose. The method proposed is 6th WSEAS International Conference on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, Dec 29-31, 2007 356