Performance evaluation of real-time video content analysis systems in the CANDELA project 1 Xavier Desurmont a , Rob Wijnhoven b,c , Egbert Jaspers b Olivier Caignart d , Mike Barais e , Wouter Favoreel f , Jean-François Delaigle a a Multitel A.S.B.L., Av Copernic, 1, B-7000 Mons, Belgium. b Bosch Security Systems, Eindhoven, The Netherlands. c Eindhoven University of Technology, Eindhoven, The Netherlands. d IT-OPTICS, Mons, Belgium. e Vrije Universiteit Brussel, Belgium. f Traficon, Belgium. ABSTRACT The CANDELA project aims at realizing a system for real-time image processing in traffic and surveillance applications. The system performs segmentation, labels the extracted blobs and tracks their movements in the scene. Performance evaluation of such a system is a major challenge since no standard methods exist and the criteria for evaluation are highly subjective. This paper proposes a performance evaluation approach for video content analysis (VCA) systems and identifies the involved research areas. For these areas we give an overview of the state-of-the-art in performance evaluation and introduce a classification into different semantic levels. The proposed evaluation approach compares the results of the VCA algorithm with a ground-truth (GT) counterpart, which contains the desired results. Both the VCA results and the ground truth comprise description files that are formatted in MPEG-7. The evaluation is required to provide an objective performance measure and a mean to choose between competitive methods. In addition, it enables algorithm developers to measure the progress of their work at the different levels in the design process. From these requirements and the state-of-the-art overview we conclude that standardization is highly desirable for which many research topics still need to be addressed. Keywords: real-time processing, computer vision, performance evaluation, MPEG-7, video content analysis. 1. INTRODUCTION The number of security and traffic cameras installed in both private and public areas is increasing. Since human guards can only effectively monitor a limited number of camera monitors, automatic analysis of the video content is required. Examples of challenging applications [2] are monitoring metro stations [3] or detecting highway traffic jams, unattended object detection [1] and detecting of loitering persons. Since the last decade, many algorithms have been proposed that try to solve the problem of scene understanding. The level of understanding varies highly from only detecting moving objects and outputting their bounding boxes (e.g. the OpenSource project “Motion” 2 ), to tracking of the objects over multiple cameras, thereby learning common paths and appearance points [28], [30] or depth maps and amount of activity in the scene [29]. Apart from functional testing, there are several other reasons for evaluating the video content analysis (VCA) systems; scientific interest, measuring the improvement during development, benchmarking with competitors, commercial purposes and finally legal/regulatory requirements. However, most literature describing VCA algorithms, cannot give objective measures on the quality of the results. For example, for video compression algorithms the criterion is to minimize the absolute difference between the decoded result and the original with the PSNR as standard metric. 1 This work is part of the European ITEA project CANDELA (Content Analysis Networked DElivery Architectures)., ip02013 http://www.extra.research.philips.com/euprojects/candela/ 2 OpenSource project Motion: http://sourceforge.net/projects/motion/