Overview of the MediaEval 2014 Visual Privacy Task Atta Badii 1 , Touradj Ebrahimi 2 , Christian Fedorczak 3 , Pavel Korshunov 4 , Tomas Piatrik 5 , Volker Eiselein 6 , Ahmed Al-Obaidi 7 1 & 7 {atta.badii, a.al-obaidi}@reading.ac.uk 2 & 4 {touradj.ebrahimi, pavel.korshunov}@epfl.ch 3 christian.fedorczak@thalesgroup.com 5 t.piatrik@qmul.ac.uk 6 eiselein@nue.tu-berlin.de ABSTRACT This paper presents an overview of the Visual Privacy Task (VPT) of MediaEval 2014, its objectives, related dataset, and evaluation approaches. Participants in this task were required to implement a privacy filter or a combination of filters to protect various personal information regions in video sequences as provided. The challenge was to achieve an adequate balance between the degree of privacy protection, intelligibility (how much useful information is retained post privacy filtering), and pleasantness (how minimal were the adverse effects of filtering on the appearance of the video frames). The submissions from the eight (8) teams who participated in this task were evaluated subjectively by surveillance experts, practitioners, data protection experts and by naïve viewers using a crowdsourcing approach. 1. INTRODUCTION Advances in artificial intelligence and video surveillance have led to increasingly complex surveillance systems of rising scale and capabilities. The ubiquity and enhanced capability of such surveillance can pose significant threats to citizens’ privacy and therefore new mitigation technologies are needed to ensure an appropriate level of privacy protection. The Visual Privacy Task (VPT) of MediaEval 2014 thus provided an opportunity for experimentation to explore how video-analytic techniques may arrive at enhanced solutions to some visual privacy problems [1]. This task focuses on privacy protection techniques that are responsive to the context-specific needs of persons for privacy. The evaluation was performed using three distinct user studies aimed at developing a deeper understanding of users’ perceptions of the effects and side-effects of privacy filtering to ensure the validity and user-acceptability of the evaluation results. 2. VPT 2014 DATASET The PEViD dataset [2] was specifically created for impact assessment of the privacy protection technologies. The dataset consists of two subsets, namely the training and testing sets; comprising of (21) videos as captured with both standard and high resolution cameras. The video clips are in MPEG format in full HD resolution of (1920x1080) pixels at a rate of (25) frames per second and approximately (16) second each. The video data includes various scenarios featuring one or several human subjects walking or interacting. The actors may also carry specific items, which could potentially reveal their identity and may therefore need to be privacy-filtered appropriately. For example, the actors are featured carrying backpacks, umbrellas, wearing scarves, and performing various actions, such as fighting, pickpocketing, dropping-a-bag, or simply walking. Actors may be at a distance from the camera or near the camera, making their faces appear with varying pixel size and quality. The ambient lighting conditions of the videos also varied widely as they recorded a range of indoors, outdoors, day/night-time scenes. The ground truth was created manually by the task organisers and consisted of annotations of the bounding boxes containing the regions of High (H), Medium (M), or Low (L) Personally Identifiable Information elements (PIIs) including persons’ faces and accessories. In order to simulate context–aware privacy protection solutions [3], unusual events occurring within the video datatset, such as fighting, stealing and dropping-a-bag were also annotated. The annotations were provided in XML format alongside a foreground mask in the form of binary sequences. These included such annotations that distinguished the relative privacy sensitivity of PIIs; namely for Skin (M), Face (H), Hair (L), Accessories (M), and for Person’s body (L). The dataset was provided in accordance with the European Data Protection and ethical compliance guidelines including informed consent and access control as required. Figure 1 depicts a sample frame from the dataset with annotated regions as rectangles. Figure 1: Sample annotated frame from the VPT Dataset 3. MOTIVATION AND OBJECTIVES The MediaEval 2014 Visual Privacy Task was motivated by application domains such as video privacy filtering of videos taken in public spaces, by smart phones, web-cams, surveillance CCTVs, and, videos stored in social websites. For this task, the participants were encouraged to implement a combination of several privacy filters to protect various personal information regions in videos, by optimising the privacy filtering so as to: i) obscure such personal information effectively whilst, ii) keeping as much as possible of the ‘useful’ information that would enable a human viewer to form some ‘useful’ interpretation of the obscured video frame at some level of abstraction without compromising the privacy protection level as required by the person(s) featured in the video-frame. Personal visual information is subjective human-perceived information that can expose a person’s identity to a human viewer. This can include richly detailed image regions such as distinctive facial features or personal jewellery as well as less rich uniform regions e.g. skin Copyright is held by the author/owner(s). MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain