Model assisted object tracking F. Aldershoﬀ, Th. Gevers and Ph. Prins University of Amsterdam, Kruislaan 403, Amsterdam, The Netherlands; ABSTRACT Many current video analysis systems fail to fully acknowledge the process that resulted in the acquisition of the video data, i.e. they don’t view the complete multimedia system that encompasses the several physical processes that lead to the captured video data. This multimedia system includes the physical process that created the appearance of the captured objects, the capturing of the data by the sensor (camera), and a model of the domain the video data belongs to. By modelling this complete multimedia system, a much more robust and theoretically sound approach to video analysis can be taken. In this paper we will describe such a system for the detection, recognition and tracking of objects in video’s. We will introduce an extension of the mean shift tracking process, based on a detailed model of the video capturing process. This system is used for two applications in the soccer video domain: Billboard recognition and tracking and player tracking. Keywords: Video analysis, object tracking, mean shift tracking, colour invariants 1. INTRODUCTION Object tracking has been a research topic in both computer vision and robotics for a long time. Recently the research in this topic has intensiﬁed. This renewed interest in tracking is caused by two factors. First there is the enormous growth of the volume of captured and stored video data, fuelled by cheap digital cameras and cheap digital storage. The other factor is the increase in processing power. Moore’s Law and parallel/distributed computing have given us enough computational power to deal with the complexity of video processing tasks such as object detection and object tracking. Many current video analysis systems fail to fully acknowledge the process that resulted in the acquisition of the video data, i.e. they don’t view the complete multimedia system that encompasses the several physical processes that lead to the captured video data 1;2;3 . This leads to systems in which many ad-hoc decisions have to be made. Kernel-based object tracking in combination with the Bhattacharyya coeﬃcient as a distance metric, is used to track objects in sports video’s (and other types of video data). Comaniciu et al. use the Mean Shift algorithm to locate a target model 4 . This approach uses colour features without paying attention to the underlying models. As we will show in this paper, special care has to be taken when using transformed colour features, since this transformation also alters the error distribution of the measured features. In this paper we will review a method that will take this change in error distribution into account, allowing for robust and sound colour features. In their paper 5 Comaniciu et al. present a method to deal with noisy data. Since they assume little or no prior knowledge on the data, their approach is based on an estimating the uncertainty in the data by an iterative procedure. Our strategy exploits the knowledge that is available on the capturing process of the data, such as camera characteristics and domain knowledge. From this information it is possible to reliably calculate the uncertainty, based on characteristics of the colour space transform 6 . In this paper we will ﬁrst discuss a way to model colour. From this model a set of robust colour features is deduced. In section 3, we will model the eﬀects of noise introduced by the camera and provide a way to alleviate the eﬀects of this noise in the transformed colour models. In section 3.5, we will show how this model can easily Further author information: (Send correspondence to F. Aldershoﬀ) F. Aldershoﬀ: E-mail: R.F.Aldershoﬀ@uva.nl, Telephone: +31 (0)20 525 7540