Real-time video tracking using PTZ cameras Sangkyu Kang ∗a , Joonki Paik ab , Andreas Koschan a , Besma Abidi a , and Mongi A. Abidi a a IRIS Lab, Department of Electrical & Computer Engineering, University of Tennessee Knoxville, Tennessee 37996-2100 b Image Processing Laboratory, Department of Image Engineering Graduate School of Advanced Imaging Science, Multimedia, and Film, Chung-Ang University, 221 Huksuk-Dong, Dongjak-Gu, Seoul, 157-756, Korea ABSTRACT Automatic tracking is essential for a 24 hours intruder-detection and, more generally, a surveillance system. This paper presents an adaptive background generation and the corresponding moving region detection techniques for a Pan-Tilt- Zoom (PTZ) camera using a geometric transform-based mosaicing method. A complete system including adaptive background generation, moving regions extraction and tracking is evaluated using realistic experimental results. More specifically, experimental results include generated background images, a moving region, and input video with bounding boxes around moving objects. This experiment shows that the proposed system can be used to monitor moving targets in widely open areas by automatic panning and tilting in real-time. Keywords: Video tracking, adaptive background generation, object extraction 1. INTRODUCTION Since September 11, the desire for safety in public locations, such as airports and government buildings where many people gather, has been increasing. For security purposes, these places usually adopt video-based surveillance that can record scenes by fixed or PTZ cameras, which can change their viewing area by using predefined rotation tables and timers. The recorded video can be examined often intrusions or accidents have occurred; however, this means that the system does not provide a real-time warning, which is very important to prevent accidents. For detecting an intruder or tracking suspects with current systems, the video surveillance system should be monitored by human operators, but this is not a simple task even for the skillful operators given the tedious task of watching video for more than a couple of hours. Also monitoring a large area requires many operators at the same time. Recently, many tracking algorithms have been developed, and can be categorized into three groups: adaptive background generation and subtraction, 1,2 tracking using shape information, 3,4 and region based tracking. 5,6 Adaptive background generation is a very efficient way to extract moving objects, and the Gaussian distribution for each pixel is usually used to build a static gray background 1 or color background 7 . This method, however, requires stationary cameras to build background, and various research was proposed to extend the camera’s view to track objects in large areas. L. Lee et al proposed a method to align the ground plane across multiple views to build a common coordinate for multiple cameras 8 . An omni-directional camera was also used to extend the field of view to 360˚ 9 with adaptive background generation and subtraction, but detected objects, such as a moving person or intruder, are usually at very low-resolution since one camera is used to grab the entire surrounding. Tracking using shape information is very interesting when used to track known objects or an object in the data base by B-spline contours or active shape models, but this method usually requires initial placement of an initial shape, which needs to be placed as near as possible to the targets, to get accurate tracking ∗ sangkyu@iristown.engr.utk.edu, phone: +1-865-974-9685, fax: +1-865-974-5459, http://imaging.utk.edu 151