Preservation of Structural Properties in Anonymized Social Networks Traian Marius Truta, Alina Campan Department of Computer Science Northern Kentucky University Highland Heights, KY 41099, USA {trutat1, campana1}@nku.edu Anca L. Ralescu Department of Computer Science University of Cincinnati Cincinnati, OH 45221, USA anca.ralescu@uc.edu Abstract—Social networks such as Facebook, LinkedIn, or Twitter have nowadays a global reach that surpassed all previous expectations. Many social networks gather confidential information of their users, and as a result, the privacy in social networks has become a topic of general interest. To defend against privacy violations, several social network anonymization models were introduced. In this paper, we empirically study how well several structural properties of a social network are preserved through an anonymization process. We first anonymize several real and synthetic social networks using the k- anonymous cluster social network model, and then we compare how well structural properties such as diameter, centrality measures, clustering coefficients, and topological indices are preserved between the original networks and their anonymized versions. Our experiments show that there are correlations between the structural properties’ values obtained from the original network and from the corresponding anonymized networks. Preserving such graph properties through anonymization might be extremely important / essential for subsequent graph-mining of the anonymized networks. Index Terms—K-Anonymity, Privacy, Social Networks, Structural Properties. I. INTRODUCTION AND MOTIVATION Social networks such as Facebook [12], LinkedIn [18], or Twitter [36] have nowadays a global reach that surpassed all previous expectations. Smaller social networks that focus on specialized domains such as sports, games, and technology have also attracted a large number of users in the last years. For instance, FanCru offers sport fans a place to connect and share information [13], Playfire [28] and WeeWorld [39] are social networks that attract online gamers, and Toolbox for IT (Information Technology) is a knowledge-sharing community for IT members [34]. Most Internet users are part of one or more social networks today and they contribute with a wealth of information to these networks. Many social networks gather confidential information about their users, information that could potentially be misused. For instance, in the healthcare field, PatientsLikeMe [26], a social network with more than 150,000 users as of July 2012, creates communities of patients for various diseases. Due to this amount of sensitive data gathered by social network sites, the privacy in social networks is a concern for many users and the research in this field has flourished in the past several years. Several research directions in the social networks’ privacy field are outlined next. Backstrom et al. illustrate the shortcomings of the naïve graph anonymization, which replaces the identity of individual nodes by synthetically created identifiers. Two types of attacks, passive and active attacks, are presented in this context [2]. Narayanan and Shmatikov performed a de-anonymization experiment that compromised the privacy of a third of the users who had accounts on both Twitter and Flickr, with a 12% error rate [22]. To defend against privacy attacks, several social network privacy models were introduced. These models can be categorized into graph modification models and clustering- based models. In the graph modification category, Liu and Terzi’s introduced the k-degree anonymity model, in which the original social network is modified such that the released social network will have at least k nodes with the same degree [19]. Zhou and Pei defined a model called k-neighborhood anonymity, in which each node must have k others nodes with the same 1-neighborhood characteristics [43]. Edge additions and/or deletions are performed in order to satisfy both k-degree anonymity and k-neighborhood anonymity. Zou et al. assume a more powerful adversary and their model, titled k- automorphism anonymity, requires that each node from the social network is unindistinguisable from other k-1 nodes with respect to any subgraph in which the node belongs [45]. Two other models, named k-symmetry [8] and k-isomorphism [40], are similar to k-automorphism. The social networks that satisfy one of these three models are created via a process of both node- and edge- additions / deletions. It is not well understood how the graph structure is preserved during the anonymization process, and this represents a significant limitation of the graph modification techniques. In the clustering-based category, Campan and Truta introduced the k-anonymous clustered social network model, in which nodes are grouped together in clusters and super-nodes and super-edges are created [6]. This clustering-based approach to social network anonymity is briefly presented in Section 2 of this paper. Its full presentation can be found in [6]. Related clustering approaches were presented in [3, 17, 41].