Evaluating Multi-Object Tracking Kevin Smith, Daniel Gatica-Perez, Jean-Marc Odobez, and Sileye Ba * IDIAP Research Institute, Martigny, Switzerland {smith, gatica, odobez, sba}@idiap.ch Abstract Multiple object tracking (MOT) is an active and challeng- ing research topic. Many different approaches to the MOT problem exist, yet there is little agreement amongst the com- munity on how to evaluate or compare these methods, and the amount of literature addressing this problem is limited. The goal of this paper is to address this issue by providing a comprehensive approach to the empirical evaluation of tracking performance. To that end, we explore the tracking characteristics important to measure in a real-life applica- tion, focusing on conﬁguration (the number and location of objects in a scene) and identiﬁcation (the consistent label- ing of objects over time), and deﬁne a set of measures and a protocol to objectively evaluate these characteristics. 1 Introduction Although object tracking is considered a mature ﬁeld of research, there is a disturbing lack of uniformity in how results are presented by the community. Award-winning tracking papers rarely use the same data or metrics. This makes comparisons between different methods difﬁcult and stiﬂes progress. At the root of this problem is the lack of common data sets and performance measures (with few ex- ceptions, such as PETS [5]). In this paper, we will outline a framework for evaluating MOT methods which allow for (1) numerical evaluation to measure performance, and (2) visu- alization to understand tracking phenomena. We will also deﬁne speciﬁc measures and protocols within this frame- work relevant to academic and real-life evaluation. In order to deﬁne a framework for tracking evaluation, it is important to understand what qualities are essential to good tracking. To do so, it can be helpful to consider what constitutes a “golden” multi-object tracker. One could argue that a good tracker, in a real-life situation, should: 1. start automatically, 2. track objects well - place the correct number of track- ers at the correct locations each frame, * This work was supported by the Swiss National Center of Competence in Research on Interactive Multimodal Information Management (IM2), and by the European Union 6th FWP IST Integrated Project AMI (Aug- mented Multi-party Interaction, FP6-506811, publication AMI-74). 3. identify objects well - track individual objects consis- tently over a long period of time, 4. track objects in spite of distraction (occlusion, illumi- nation changes, etc.), 5. accurately estimate task-speciﬁc object parameters (such as object velocity), 6. be fast. This list can be reduced to four key properties (items 2, 3, 5, and 6). Item 2 refers to the conﬁguration, or the number and location of objects in the scene. Item 3 refers to identiﬁ- cation: the consistent labeling of objects over a long period of time. Items 1 and 4 depend on the model and type of tracking algorithm, and can be indirectly measured by mea- suring the conﬁguration and identiﬁcation. Item 5 refers to the ability of the method to correctly predict some task- speciﬁc parameter of an object in the scene. Finally, item 6 refers to the speed, or computational cost of the method. This paper focuses on the more generic tasks of evaluating conﬁguration and identiﬁcation. There have been recent attempts to measure conﬁgura- tion and identiﬁcation in the past with various degrees of success [4, 2, 3, 6]. In [4], measures were proposed to eval- uate the conﬁguration of a single-object tracker to a limited degree. In [2], conﬁguration and identiﬁcation were evalu- ated based on the distance between centroids of the objects and the trackers. However, this method does not account for two differently shaped objects that share a centroid. In [6], conﬁguration measures were deﬁned similar to the four con- ﬁguration errors we propose here, but an overall conﬁgura- tion measure is not provided and the issue of identiﬁcation is not addressed. In the very recent work of [3], two iden- tiﬁcation measures and two conﬁguration measures similar to ours were independently proposed, but with important distinctions, detailed in later sections. Evaluating compu- tational cost can be a complex task [7], and is beyond the scope of this paper. The remainder of this paper is organized as follows. First, we introduce fundamental concepts to tracking evalu- ation in Section 2. We describe, in detail, how to evaluate conﬁguration in Section 3. Section 4 describes identiﬁca- tion evaluation. Section 5 outlines how task-speciﬁc mea- sures can be ﬁt into the framework, and Section 6 contains some ﬁnal remarks. 0-7695-2372-2/05/$20.00 (c) 2005 IEEE