Self-Calibration of Traffic Surveillance Camera using Motion Tracking Tuan Hue Thi, Sijun Lu and Jian Zhang Abstract— A statistical and computer vision approach us- ing tracked moving vehicle shapes for auto-calibrating traffic surveillance cameras is presented. Vanishing point of the traffic direction is picked up from Linear Regression of all tracked vehicle points. Preliminary straightening model is then built to help collect statistics of the typical vehicle class traveling in each particular scene. Analysis on this class eventually helps to compute the complete calibration parameters. Results obtained from the validation step against traditional methods in different traffic locations demonstrate its desirable accuracy with much more flexibility and reliability. I. I NTRODUCTION A. Motivation In every computer vision problem, especially those in traffic surveillance, knowledge about the camera is essen- tial for the purpose of content retrieval. There have been around many different techniques used for obtaining those camera parameters, which is known as calibration process. Common classification divides those techniques into manual [10][8] and self-calibration [5][3]. The need for flexible and cost-effective self-calibration techniques has brought to the computer vision world many different interesting techniques, and are mostly conceptualized around the line between flexibility and accuracy. However, even with the current popular methods [6] [3] for calibrating single camera in traffic surveillance, they still in one way of the other rely on certain assumptions about the scene location. In order to breakthrough these flexibility constraints, we propose a highly flexible self-calibration technique which can be used in most traffic locations and has been proven to produce as desirable results as existing state-of-the-art self-calibration techniques do. B. Related Works Representative approach for manual camera calibration use square patterns in 3D (by either different orthogonal planes [8] or one plane rotating on three coordinate axes [10]). These are the most accurate approach since all the freedom degrees necessary for the transformation between image and world are obtained. However, the high cost of manually setting up the calibrating model makes these approaches hardly applicable in practical traffic surveillance. Self-calibration techniques are often seen to use vanishing points available in the scene together with some image- world analysis to calculate camera intrinsic and extrinsic parameters [2]. These methods normally take different views All authors are with the National ICT of Australia, 223 Anzac Parade, Kensington NSW 2052, Australia (Jian Zhang; phone: +61 2 8306 0780; fax: +61 2 8306 0403; email: jian.zhang@nicta.com.au). of an architectural object to generate the sufficient vanishing points in all dimensions. These approach, however, have very limited application in traffic surveillance where the camera is fixed and only limited knowledge about the traffic scene is produced. The technique described by Cathey and Dailey [3] appears to solve these constraints of uncalibrated camera using van- ishing point obtained from the lane straightness on highway together with the stripe lane frequency and lane width. The interesting idea from their method is the use of a rough value of the panning angle to obtained the straightened model of the road, which is in fact proven to be good enough for generating the accurate estimation of the camera parameters. The main disadvantage of this technique, however, is the dependence on lane marking and existing stripe lane to do the calculation. This makes this technique inapplicable in surveillance locations where knowledge about the travel lanes is unknown or the stripe lane is not present. Bose and Grimson from their work in [1] introduced an interesting idea of using object tracking and prior knowledge about object dimensions to rectify images of the ground plane. Although this work is not a complete calibration process, it helps to bring up another way of solving image- world transformation using object detection and tracking from the image together with sufficient knowledge of the object’s physical characteristics. The complete camera calibration work using moving objects in the scene was introduced in [4] by Fengjun et al. in which they detect and track walking human on different direction to derive three vanishing points of the scene, together with measurement from images of human at leg-crossing phases to derive the intrinsic and extrinsic parameters of the camera. This method, however, cannot be applied to our problem of traffic surveillance where normally vehicles only travel on one line, and the knowledge about the vehicles traveling on the road is much more vague than human, where typically there are many kinds of vehicles with different sizes running on the same road. C. Our Approach Description and experiment results of our proposed method will be structured in this paper as followed. We first detect motion blobs in the traffic scenes using motion detection algorithm described in Wang et al. work [9]. Those blobs will then be passed through several filters to eliminate noises and smooth out the measurements, before being put into a Linear Kalman Filter for tracking purpose. The history of all these tracking records then helps us to produce one trajectory for each single blob using Linear Regression. A Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems Beijing, China, October 12-15, 2008 1-4244-2112-1/08/$20.00 ©2008 IEEE 304