I’m 2.8% Neanderthal The beginning of genetic exhibitionism? Lukasz Olejnik INRIA, Rhone-Alpes lukasz.olejnik@inria.fr Agnieszka Kutrowska Department of Biochemistry Adam Mickiewicz University Poznan, Poland akutr@amu.edu.pl Claude Castelluccia INRIA, Rhone-Alpes claude.castelluccia@inria.fr ABSTRACT Direct-to-consumer genetic testing is gaining popularity. How- ever, the sensitive nature of personal genomic sequencing re- sults might not be fully understood by the general public. In this paper we study the examples of disclosure of this sen- sitive information on social networks. We found that Twit- ter users often post their results publicly. We observed that information on ethnic background is much more frequently released than other information, for example relating to dis- ease risk. This data could be of potential value to entities such as insurance companies. We found that about 24% of the analyzed tweets that men- tioned ethnicity results also contained percentage data. In cases of users disclosing more details of their ethnic back- ground, we found about 96% of these profiles also included identifying information and consequently can be attributed to individuals. As a result, external entities such as insurance companies can gain an insight in the genetic test results and in the end the users could be subject to genetic discrimination. 1. INTRODUCTION The dawn of publicly available commercial genetic testing is almost upon us. With the recent progress of genome sequencing and the achievement of the $1000 per genome milestone [36] by Illumina [24], the promise of fast extraction of data from whole genomes will ulti- mately be fulfilled. Genetic sequencing follows Moore’s Law [7], which suggests that personal genome sequenc- ing will soon be widely available and ubiquitous. However, with the new possibilities new risks appear. People are often unaware of the consequences of private data disclosure and this is exemplified by their behav- ior while they use social networks. Users routinely post sensitive data without proper consideration; a good ex- ample is posting credit card photos to Twitter [20, 30], which are even disseminated by a dedicated Twitter feed [18]. People also frequently post pictures of their identity documents [26]. Unsurprisingly, social network users often regret the fact that they shared too much and/or inappropriate information [38]. In this paper, we study the potential problems aris- ing from the disclosure of genetic test results on Twitter. The significance of our study is heightened by the funda- mental lack of awareness of the millions of social media users in regards to guarding sensitive data. We stress that disseminating personal genetic test results can also have ramifications for the user’s relatives [23]; it could also have consequences, known as genetic discrimina- tion, from health insurers [28, 17]. Health records are also being used in ad targeting [12]. We study the Twitter users who disclosed their genome sequencing results obtained from 23andMe. Although according to a recent ethnographic study, people are typically concerned with ethical and privacy risks relat- ing to the disclosure of genetic test results [14], during the course of our study, we found that many Twitter users had no qualms about publishing this information. Companies such as 23andMe should probably devote more effort to familiarizing their users with the actual risks of disclosure. We acknowledge the practicality and ethical nature of disclosing the genetic risks to one’s relatives [15], which is often encouraged by medical communities [16]. The paper’s organization is as follows: in section 3, we discuss the background behind personal genomic se- quencing and how companies such as 23andMe operate. In section 4, we highlight the hazards related to the dis- closure of the genotyping results. Section 5 is devoted to the results and analysis of genotyping disclosures in Twitter network. 2. RELATED WORK The phenomenon of oversharing private information has been observed since the beginning of the social net- work era. A lot of sensitive information about the users can be extracted from their public feeds. Acquisti et al. studied this problem using Facebook [2]. Mao et al. analyzed private data leaks over Twitter and among the studied information are examples related to health [31]. Cheng et al. showed that it is possible to infer the physical residence of the user with data obtained from