Soccer Players Identification Based on Visual Local Features Lamberto Ballan, Marco Bertini, Alberto Del Bimbo, and Walter Nunziati Dipartimento di Sistemi e Informatica - University of Florence {ballan, bertini, delbimbo, nunziati}@dsi.unifi.it ABSTRACT Semantic detection and recognition of objects and events contained in a video stream has to be performed in order to provide content-based annotation and retrieval of videos. This annotation is done as a means to be able to reuse the video material at a later stage, e.g. to produce new TV programmes. A typical example is that of sports videos, where videos are annotated in order to reuse the video clips that show key highlights and players to produce short sum- maries for news and sports programmes. In order to select the most interesting actions among all the possibly detected highlights further analysis is required; i.e. the shots that contain a key action are typically followed by close-ups of the players that take part in the action. Therefore the au- tomatic identiﬁcation of these players would add consider- able value both to the annotation and retrieval of the key highlights and key players of a sport event. The problem of detecting and recognizing faces in broadcast videos is a widely studied topic. However, in the case of soccer videos, and sports videos in general, the current techniques are not suitable for the task of face recognition, due to the high variations in pose, illumination, scale and occlusion that may happen in an uncontrolled environment. In this pa- per a method that copes with these problems, exploiting local features to describe a face, without requiring a pre- cise localization of the distinguishing parts of a face, and the set of poses to describe a person and perform a more robust recognition, is presented. A similarity metric based on the number of matched interest points, able to cope with diﬀerent face sizes, is also presented and experimentally val- idated. Categories and Subject Descriptors H.3.7 [Information Storage and Retrieval]: Digital Li- braries; D.2.4 [Systems]: Multimedia databases Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIVR’07, July 9–11, 2007, Amsterdam, The Netherlands. Copyright 2007 ACM 978-1-59593-733-9/07/0007 ...$5.00. General Terms Algorithms, Experimentation Keywords Sport video analysis, automatic annotation, person recogni- tion 1. INTRODUCTION AND PREVIOUS WORK To provide content-based annotation and retrieval of videos, semantic detection and recognition of objects and events contained in a video stream has to be performed. At present this task is carried out, typically, by human annotators that, in the case of broadcasters, follow proprietary annotation standards. This annotation activity is done in order to be able to reuse the video material at a later stage, e.g. to produce new TV programmes. An example is that of sports videos, where live video streams are annotated in order to reuse the video clips that show key highlights and players; these clips are then edited to produce short summaries for news and sports programmes. Most of the recent works that have researched the automatic annotation of sports videos have dealt with the detection of sports highlights (e.g. shots on goal for soccer videos, pitching for baseball, shots for bas- ketball, etc.). A comprehensive review of these works can be found in [8, 16]. In order to select the most interesting actions among all the possibly detected highlights further analysis is typically required; i.e. the shots that contain a key action are typically followed by close-ups of the players that had an important role in the action. For example in the case of soccer videos scored goals or near misses are followed by shots that show the player that carried out the action; after a foul the in- jured and the oﬀending players are framed, etc. Therefore the automatic identiﬁcation of these players would add con- siderable value both to the annotation and retrieval of the key highlights and key players of a sport event. The problem of detecting and recognizing faces in broad- cast videos is a widely studied topic. A survey of the vast literature on face detection and recognition has been pre- sented in [7, 15, 17]. Most of the face recognition methods are evaluated on videos whose content has been ﬁlmed in controlled envi- ronments and for relatively limited sets of faces and poses (e.g. serials or movies). However, in the case of soccer videos, and sports videos in general, the current techniques are not suitable for the task of face recognition, due to the high variations in pose, illumination, settings, scale and occlu-