A Scalable K-Anonymization Solution for Preserving Privacy in an Aging-in-Place Welfare Intercloud Antorweep Chakravorty, Tomasz Wiktor Wlodarczyk, Chunming Rong Department of Computer and Electrical Engineering University of Stavanger Stavanger, Norway {antorweep.chakravorty, tomasz.w.wlodarczyk, chunming.rong}@uis.no Abstract—Aging-in-Place solutions are becoming increasingly prevalent in our society. New age big data technologies can harness upon enormous amount of data generated from sensors in smart homes to provide enabling services. Added care and preventive services can be furnished through interoperability and bidirectional dataflow across the value chain. However the nature of the problem domain which although allows establishing better care through sharing of information also risks disclosing complete living behavior of individuals. In this paper, we introduce and evaluate a novel scalable k-anonymization solution based upon the distributed map-reduce paradigm for preserving privacy of the shared data in a welfare intercloud. Our evaluation benchmarks both information loss and data quality metrics and demonstrates better scalability/performance than any other available solutions. Keywords—privacy; k-anonymization; hadoop; intercloud; aging in place; I. INTRODUCTION The growth of elderly population is to double, in coming years. In order to maintain, improve the standard of healthcare services and quality of help, Aging-in-Place (AIP) technologies [1]–[5] would play a crucial role. Traditional healthcare services to residential homes could be extended as smart homes using sensor networks supported by data analytics to deliver assistive services. One such specific initiative being, the Safer@Home intercloud [6] at the University of Stavanger. Through this project using solutions such as Hadoop [7], large amounts of sensor data from various homes are collected centrally to effectively perform knowledge discovery algorithms and establish preventive care. Mynatt et. al. [8] point to privacy and autonomy challenges that are created by an AIP platform, due to the nature of data which is extremely sensitive & personal. At the same time, it is infeasible to perform analytics on data that are transformed wherein it is important to record granular events and be able to identify individuals to whom care needs to be furnished. AIP services are complex and involve multi-disciplinary stakeholders at different operational and financial levels. Analysis results often need to be furnished to different actors (doctors, specialist, nurses, researchers, commune and third parties) using different cloud services. For some of these actors the presented information should be identifiable so as to provide right care to right individuals. Whereas, the data should be transformed without losing its truthfulness for other actors. In an earlier work we introduced a privacy preserving data analysis framework [9] to maintain data utility, ensure security and preserve privacy at different stages of the data lifecycle (collection, storage, processing & sharing). We proposed using k-anonymization [10] to protect privacy of shared micro data. Our Contribution: Heuristic based k-anonymization algorithms lack scalability to data spread across various nodes in a cluster. Traditional implementations are based on data in centralized storages that are anonymized and released. The data being collected from smart homes represents huge volume, velocity and frequency unsuitable for traditional relational storage systems. New age No-SQL based solutions, is suited to handle such kinds of data, but would need a completely different approach in anonymizing them. We present and evaluate a novel distributed MapReduce [11] based iterative scalable k-anonymization solution, build upon a existing and well accepted multi-dimensional partitioning algorithm called Mondrian [12] for sharing of welfare data while preserving privacy of individual and maintaining its utility. Organization: The rest of the paper is structured as follows. Section II gives an overview of the Safer@Home intercloud. Section III provides a background on the different methods and technologies used in developing our solution. The overview of our Distributed Multidimensional Anonymization solution is given in section IV, with its detailed design presented in section V. Section VI evaluated the solution and the related work is in section VII. The conclusion is in section VIII. II. SAFER@HOME INTERCLOUD The Safer@Home welfare intercloud [6] is a smart system that supports integrated and assured AIP services for elderly in a smart home environment, based on recent advances in data- intensive analysis, wireless communications, machine-to- machine (M2M) service architecture, security and reliability, and available broadband in a Fiber-To-The- Home (FTTH) setting. The system extends and strengthens social networks of healthcare services by integrating Internet of Things (IoT) in a smart home with off-site professional service providers. Supported by a bigdata analytic engine (a key behind the recent revolution in big-data processing enabling large scale online social networking), the platform supports intelligent and scalable ICT-assisted decision-making, integrates and assures different AIP services, such as: social interaction (via e.g. video, forum) to prevent social isolation and loneliness, monitoring services enabling prevention, safety services reducing anxiety and fear, overall disease management and 2014 IEEE International Conference on Cloud Engineering 978-1-4799-3766-0/14 $31.00 © 2014 IEEE DOI 10.1109/IC2E.2014.43 424