Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Noman Mohammed Benjamin C. M. Fung Patrick C. K. Hung Cheuk-kwong Lee CIISE, Concordia University, Montreal, QC, Canada University of Ontario Institute of Technology, Oshawa, ON, Canada Hong Kong Red Cross Blood Transfusion Service, Hong Kong {no_moham, fung}@ciise.concordia.ca patrick.hung@uoit.ca ckleea@ha.org.hk ABSTRACT Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this paper, we study the privacy concerns of the blood transfusion information-sharing system between the Hong Kong Red Cross Blood Transfusion Service (BTS) and public hospitals, and identify the major challenges that make traditional data anonymization methods not applica- ble. Furthermore, we propose a new privacy model called LKC-privacy, together with an anonymization algorithm, to meet the privacy and information requirements in this BTS case. Experiments on the real-life data demonstrate that our anonymization algorithm can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets. Categories and Subject Descriptors H.2.7 [Database Administration]: [Security, integrity, and protection]; H.2.8 [Database Applications]: [Data mining] General Terms Algorithms, Performance, Security Keywords Privacy, anonymity, classification, healthcare 1. INTRODUCTION Gaining access to high-quality health data is a vital re- quirement to informed decision making for medical practi- tioners and pharmaceutical researchers. Driven by mutual benefits and regulations, there is a demand for healthcare institutes to share patient data with various parties for re- search purposes. However, health data in its raw form often Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’09, June 28–July 1, 2009, Paris, France. Copyright 2009 ACM 978-1-60558-495-9/09/06 ...$5.00. Collect Blood Donors Blood Transfusion Data Hospitals (store) Distribute Blood to Hospitals Patients Collect Patient Data Red Cross Our Contribution : Data Anonymizer (share) Figure 1: Data flow in Hong Kong Red Cross Blood Transfusion Service (BTS) contains sensitive information about individuals, and pub- lishing such data will violate their privacy. The current prac- tice in data sharing primarily relies on policies and guidelines on the types of data that can be shared and agreements on the use of shared data. This approach alone may lead to excessive data distortion or insufficient protection. In this paper, we study the challenges in a real-life information- sharing scenario in the Hong Kong Red Cross Blood Trans- fusion Service (BTS) and propose a new privacy model, to- gether with a data anonymization algorithm, to effectively preserve individuals’ privacy and meet the information re- quirements specified by the BTS. Figure 1 illustrates the data flow in the BTS. After col- lecting and examining the blood collected from donors, the BTS distributes the blood to different public hospitals. The hospitals collect and maintain the health records of their patients and transfuse the blood to the patients if neces- sary. The blood transfusion information, such as the pa- tient data, type of surgery, names of medical practitioners in charge, and reason for transfusion, is clearly documented and is stored in the database owned by each individual hos- pital. Periodically, the public hospitals are required to sub- mit the blood usage data, together with the patient-specific surgery data, to the BTS for the purpose of data analysis. This BTS case illustrates a typical dilemma in information sharing and privacy protection faced by many health insti- tutes. For example, licensed hospitals in California are also required to submit specific demographic data on every dis- charged patient [5]. Our proposed solution, designed for the BTS case, will also benefit other health institutes that face similar challenges in information sharing. We summarize the concerns and challenges of the BTS case as follows. Privacy concern: Giving the BTS access to blood trans- fusion data for data analysis is clearly legitimate. However, it raises some concerns on patients’ privacy. The patients 1285