How Private are WLAN traces? Udayan Kumar(Student) and Ahmed Helmy(Faculty) Department of Computer and Information Science and Engineering, University of Florida Email: {ukumar, helmy}@cise.ufl.edu I. I NTRODUCTION In this work, we investigate the extent of user’s private information that can be extracted from the anonymized Wireless Local Area Network(WLAN) traces. Why do we need to talk about privacy of users in WLAN trace? One of the answers is that many researchers use WLAN trace for analysis and research purposes, such as, to find out usage behavior of users[3], [6] or to study user mobility patterns[1] and characteristics for developing network protocols. There- fore, it is important to understand how the privacy of WLAN users gets affected. Even though, most of the trace libraries anonymize/sanitize the traces to protect user’s privacy, we present few methodologies, which can be used to reverse the anonymization. We hope that our study sheds light on the question of “How Private are WLAN traces” and how effective are exisiting anonymization techniques. The issues of privacy and anonymization have been present timelessly for the network traces. Researchers have faced similar challenges in anonymizing the wired traces[7]. Recently, wireless traces have also been collected and archived at on-line public libraries like CRAWDAD[2] and MobiLib[4] that collectively have well over 25 traces. As these are pervasively captured user information, several questions have been raised about legality[9] of the process of collecting traces. Techniques are being researched such that users himself shares his traces[8]. The pertinent question, however, which still remains unan- swered is that once traces are collected, how can they be prepared for distribution such that they have a good usability and do not compromise on privacy of the user. Our effort is targeted at this question, which has become more challenging, as we shall see, with the WLAN traces. In this work, we present our analysis of the currently used anonymization methods and their shortcomings. II. I NFORMATION I N WLAN TRACES WLAN traces are the logs of users associating with wireless Access Points(AP). A generic information tuple that they provide has MAC ID, Start time, Duration and Access Point/Location. A snapshot from an un-anonymized trace, MAC Start Time Duration(sec) AP/Location 00:11:22:33:44:55 01 Jun 2008 21:00:51 GMT 3000secs CS buildingAP1 11:22:33:44:55:66 01 Jun 2008 21:01:30 GMT 10secs ECE buildingAP2 01:02:03:04:05:06 01 Jun 2008 22:11:00 GMT 200secs MSL buildingAP1 10:20:30:40:50:60 01 Jun 2008 22:15:30 GMT 600secs MACA buildingAP1 11:22:33:44:55:66 01 Jun 2008 22:23:10 GMT 180secs ECE buildingAP3 TABLE I SAMPLE UN-ANONYMIZED TRACE after some processing, is shown in Tab.I. Some traces may provide more information like username, etc. For the sake of simplicity, we have considered the basic tuple similar to shown in Tab.I. Having a tuple with less information does not make the breaking of anonymity any easier, as compromising anonymity with lesser information, is more difficult. MAC Start Time Duration(sec) AP/Location 00:11:22:0353 01 Jun 2008 21:00:51 GMT 3000secs AcadBldg10AP1 11:22:33:0521 01 Jun 2008 21:01:30 GMT 10secs AcadBldg2AP2 01:02:03:9877 01 Jun 2008 22:11:00 GMT 200secs Library5AP1 10:20:30:3260 01 Jun 2008 22:15:30 GMT 600secs AcadBldg22AP1 11:22:33:0521 01 Jun 2008 22:23:10 GMT 180secs AcadBldg2AP3 TABLE II SAMPLE ANONYMIZED TRACE III. PREVALENT METHODS OF ANONYMIZATION Anonymization in WLAN traces is done on field by field basis[5], [4]. Either a field is fully anonymized (mapped to a random number) or only a portion of the field is anonymized. In the traces having multiple sessions per MAC addresses, providers can either randomize the MAC address to a unique value for each session, or use the same anonymization mapping of the MAC address for all the sessions(consistent mapping). This step also decides the information and utility of the traces. Consistent mapping for each MAC throughout the traces, provides ability to track a user through multiple sessions. Majority of the traces available at MobiLib[4] and Crawdad[2] provide the consistent mappings. Some traces like Dartmouth traces[5] at Crawdad[2] also anonymize the location field by giving a building level granularity of the AP’s location or by anonymizing the building name with code names such as AcadBldg10AP3[5], which signifies an AP(numbered 3) located in a building used for academic purposes. In this case, all the buildings are grouped into building classes such as acadbldg, librarybldg etc. Tab.II shows how Tab.I would look when anonymized for consistent and partial MAC anonymization with reduced location information. We will attempt to extract private information from traces which have been anonymized using this technique as this is used by many trace providers[5]. IV. ANALYSIS OF PREVALENT METHODS In this section, we present some techniques where user privacy can be theoretically compromised. We are consid- ering two possible attack scenarios: one where attacker can inject data into the traces by accessing the WLAN network (Sec. IV-A, IV-B and IV-C) and second where attacker has physical access to the campus but cannot access the WLAN network(Sec. IV-D). So, how do we decide if the anonymization is compromised? If we can identify someones anonymized MAC address in the traces, we can then be sure that anonymity has been compromised. Using this definition of compromise, we will show how to identify ones own anonymized MAC address and then how to identify any other user’s MAC address. A. Identify Your Own MAC In Trace Breaking the whole anonymization scheme can start by finding out mapping of ones own machine’s MAC address. To obtain mapping of ones own MAC address, one can use the following scheme: 1) Go to a WLAN covered area in the campus, at a time when it is not frequently visited and the WLAN usage is minimum (find this pattern from the previous traces).