Legal Aspects of CCTV Data De-Identification Oleksandr Pastukhov Department of Information Policy and Governance MaKS, University of Malta Msida, Malta oleksandr.pastukhov@um.edu.mt Abstract—The paper aims at identifying and investigating legal issues surrounding de-identification of data streams generated by CCTV applications. As a basis for further discussions, the exact subject-matter of legal protection afforded to data (images and sounds) resulting from CCTV operations is pinpointed. Thereafter, legal consequences of de-identification are established. In connection with that, delineation of and delimitation between the terms ‘de-identification’ and ‘anonymization’ is carried out. Moreover, the legal relevance of de-identification’s irreversibility is explored, along with technological and legal requirements that the irreversible de- identification must meet, the associated risks and the examples of failed (supposedly irreversible) de-identification exercises from the past. Correspondingly, lessons for the future are drawn and recommendations for rendering de-identification efforts legally compliant are put forward. Keywords—privacy, personal data, CCTV, de-identification, anonymization I. INTRODUCTION For the past 20 years, research into data de-identification has been driven primarily by sociological and statistical surveys [1], (bio)medical studies [2] and computer security [3], but other fields, such as library [4] and social media [5] user de-identification, are following suit. An increasing number of organizations start taking interest in de-identification in general and such processes as microdata anonymization [6] in particular. Over the same 20 years, closed-circuit television (CCTV) has become omnipresent: at least 1.85 million CCTV cameras are believed to be in operation in the UK alone [7]. De-identification has been often cited by the authorities as a safeguard against privacy violations. Let us see if de- identification can deliver on the promise in the CCTV context. II. SORTING OUT THE TERMINOLOGY A. The notions of ‘personal data’, ‘biometrics data’ and ‘sensitive data’ The definition of ‘personal data’ contained in the still valid Data Protection Directive [8] is surprisingly broad: Art. 2(a) defines the term as “any information relating to an identified or identifiable natural person (‘data subject’)”. By explaining in the same paragraph that “an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity”, the Directive, in essence, provides examples and categories of personal data. This definition – legally – makes any information related to a natural person, not just the one that can be used to identify him or her (the ‘identifiers’), personal data. There is loads of information, however, that relates to a data subject, but is highly unlikely or simply impossible to be used for identification purposes. A video footage of a public place, such as a city square, with people in it clearly visible and even recognizable by those who know them, bears information relating to all of them, but contains no personal data, until and unless the people in the picture, in the words of the draft Data Protection Regulation intended to replace the Directive, “can be identified, directly or indirectly, by means reasonably likely to be used by the controller or by any other natural or legal person, in particular by reference to an identification number, location data, online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person” (Article 4(1)) [9]. Hence the key to answering the question whether data are personal in the legal sense is the answer to the question if the link between the identifiers and the physical person can be established by the controller or some other person who gets access to the data. The means of establishing that link cannot be a means those persons are theoretically or potentially capable of using, but the means that they are “reasonably likely” to use. An owner of a villa with CCTV cameras installed normally would not have means “reasonably likely to be used” to identify people in the videos those cameras take. “As the purpose of video surveillance is, however, to identify the persons to be seen in the video images in all cases where such identification is deemed necessary by the controller, the whole application as such has to be considered as processing data about identifiable persons, even if some persons recorded are not identifiable in practice” [10]. Many of the data processed by CCTV are ‘biometric data’, defined for the first time at the EU level in the draft Regulation as “any data relating to the physical, physiological or behavioural characteristics of an individual which allow their unique identification, such as facial images, or dactyloscopic data” (Article 4(11)). A peculiar feature of the biometric data is that they are both part of the “information relating to an identified or identifiable natural person” and the identifiers that act as a link between the information and the person. Moreover, biometric data merit particular attention, because they are a type of identifiers that, unlike a user name or password, cannot be discarded and replaced. The research resulting in this paper was supported by the European Union under the European Cooperation in Science and Technology (COST) Action IC1206 “De-Identification for Privacy Protection in Multimedia Content”.