IJIRST International Journal for Innovative Research in Science & Technology| Volume 2 | Issue 1 | June 2015 ISSN (online): 2349-6010 All rights reserved by www.ijirst.org 33 Privacy Preserving Health Data Publishing using Secure Two Party Algorithm Sameera K M Aparna Vinayan Assistant Professor Student Department of Computer Science & Engineering Department of Computer Science & Engineering Adi Shankara Institute of Engg. & Technology, Vidya Bharati Nagar, Kalady - 683574 Adi Shankara Institute of Engg. & Technology, Vidya Bharati Nagar, Kalady - 683574 Meera R Nair Shalini Balakrishnan Student Student Department of Computer Science & Engineering Department of Computer Science & Engineering Adi Shankara Institute of Engg. & Technology, Vidya Bharati Nagar, Kalady - 683574 Adi Shankara Institute of Engg. & Technology, Vidya Bharati Nagar, Kalady - 683574 Abstract In this paper, we address the problem of private data publishing, where different attributes for the same set of individuals are held by two parties. Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. In order to achieve this, we use two systems, namely hospital and insurance in which the two party algorithm is applied to get the resultant, shared dataset. The results are compared with k-anonymity algorithm as part of experiments and found to be better and more secure. Keywords: Differential Privacy, secure data integration, secure data publishing _______________________________________________________________________________________________________ I. INTRODUCTION The research topic of privacy-preserving data publishing has received a lot of attention in different research communities, from economic implications to anonymization algorithms. Huge databases exist today due to the rapid advances in communication and storing systems. Each database is owned by a particular autonomous entity. Moreover, the emergence of new paradigms such as cloud computing increases the amount of data distributed between multiple entities. These distributed data can be integrated to enable better data analysis for making better decisions and providing high-quality services. For example, data can be integrated to improve medical research, customer service, or homeland security. However, data integration between autonomous entities should be conducted in such a way that no more information than necessary is revealed between the participating entities. The field of medicinal research and health data publishing insists upon the compliance of health regulatory bodies and rules by health information custodians, who are liable to share electronic health records for health data mining and clinical research. Health records by its nature are very sensitive and sharing even de-identified records may raise issues of patient privacy breach. Data privacy breach incidents not only create negative impacts of these health service providers in the general public but also result in possible civil lawsuits from patients for claiming compensation. In the United States of America, the Health Insurance Portability and Accountability Act (HIPAA) requires patient consent before the disclosure of health information between health service providers. Health Information Technology for Economic and Clinical Health (HITECH) Act builds on the HIPAA Act of 1996 to strengthen the privacy and security rules. HITECH Act augments an individual’s privacy protections, expands individuals new rights to their health information, and includes revisions to the penalties applied to each HIPAA violation category for healthcare data breaches. In this paper, we propose an algorithm to securely integrate person-specific sensitive data from two data providers, whereby the integrated data still retain the essential information for supporting data mining tasks. A HIC wants to share a person-specific data table with a health data miner, such as a medical practitioner or a health insurance company for research purposes. A person-specific dataset for classification analysis typically contains four types of attributes, namely the explicit identifiers, the quasi -identifier (QID), the sensitive attribute, and the class attribute. Explicit identifiers (such as name, social security number, and telephone number, etc.) are those which belongs to personal unique identification. QID (such as birth date, sex, race, and postal code, etc.) is a set of attributes having values may not be unique but their combination may reveal the identity of an individual. Sensitive attributes (such as disease, salary, marital-status, etc.) are those attributes that contain sensitive information of an individual. Class attributes are the attributes that the health data miner wants to perform classification analysis. Let D(A1,...,An, Sens, Class) be a data table with explicit identifiers removed, where{A1,...,An}are q uasi- identifiers that can be either categorical or numerical attributes, Sens is a sensitive attribute, and Class is a class attribute. A