Evaluating Multi-Object Tracking Kevin Smith, Daniel Gatica-Perez, Jean-Marc Odobez, and Sileye Ba * IDIAP Research Institute, Martigny, Switzerland {smith, gatica, odobez, sba}@idiap.ch Abstract Multiple object tracking (MOT) is an active and challeng- ing research topic. Many different approaches to the MOT problem exist, yet there is little agreement amongst the com- munity on how to evaluate or compare these methods, and the amount of literature addressing this problem is limited. The goal of this paper is to address this issue by providing a comprehensive approach to the empirical evaluation of tracking performance. To that end, we explore the tracking characteristics important to measure in a real-life applica- tion, focusing on configuration (the number and location of objects in a scene) and identification (the consistent label- ing of objects over time), and define a set of measures and a protocol to objectively evaluate these characteristics. 1 Introduction Although object tracking is considered a mature field of research, there is a disturbing lack of uniformity in how results are presented by the community. Award-winning tracking papers rarely use the same data or metrics. This makes comparisons between different methods difficult and stifles progress. At the root of this problem is the lack of common data sets and performance measures (with few ex- ceptions, such as PETS [5]). In this paper, we will outline a framework for evaluating MOT methods which allow for (1) numerical evaluation to measure performance, and (2) visu- alization to understand tracking phenomena. We will also define specific measures and protocols within this frame- work relevant to academic and real-life evaluation. In order to define a framework for tracking evaluation, it is important to understand what qualities are essential to good tracking. To do so, it can be helpful to consider what constitutes a “golden” multi-object tracker. One could argue that a good tracker, in a real-life situation, should: 1. start automatically, 2. track objects well - place the correct number of track- ers at the correct locations each frame, * This work was supported by the Swiss National Center of Competence in Research on Interactive Multimodal Information Management (IM2), and by the European Union 6th FWP IST Integrated Project AMI (Aug- mented Multi-party Interaction, FP6-506811, publication AMI-74). 3. identify objects well - track individual objects consis- tently over a long period of time, 4. track objects in spite of distraction (occlusion, illumi- nation changes, etc.), 5. accurately estimate task-specific object parameters (such as object velocity), 6. be fast. This list can be reduced to four key properties (items 2, 3, 5, and 6). Item 2 refers to the configuration, or the number and location of objects in the scene. Item 3 refers to identifi- cation: the consistent labeling of objects over a long period of time. Items 1 and 4 depend on the model and type of tracking algorithm, and can be indirectly measured by mea- suring the configuration and identification. Item 5 refers to the ability of the method to correctly predict some task- specific parameter of an object in the scene. Finally, item 6 refers to the speed, or computational cost of the method. This paper focuses on the more generic tasks of evaluating configuration and identification. There have been recent attempts to measure configura- tion and identification in the past with various degrees of success [4, 2, 3, 6]. In [4], measures were proposed to eval- uate the configuration of a single-object tracker to a limited degree. In [2], configuration and identification were evalu- ated based on the distance between centroids of the objects and the trackers. However, this method does not account for two differently shaped objects that share a centroid. In [6], configuration measures were defined similar to the four con- figuration errors we propose here, but an overall configura- tion measure is not provided and the issue of identification is not addressed. In the very recent work of [3], two iden- tification measures and two configuration measures similar to ours were independently proposed, but with important distinctions, detailed in later sections. Evaluating compu- tational cost can be a complex task [7], and is beyond the scope of this paper. The remainder of this paper is organized as follows. First, we introduce fundamental concepts to tracking evalu- ation in Section 2. We describe, in detail, how to evaluate configuration in Section 3. Section 4 describes identifica- tion evaluation. Section 5 outlines how task-specific mea- sures can be fit into the framework, and Section 6 contains some final remarks. 0-7695-2372-2/05/$20.00 (c) 2005 IEEE