842
Privacy Preserving Data Portals
Benjamin C. M. Fung
Simon Fraser University, Canada
Copyright © 2007, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
IntroductIon
Information in a Web portal often is an integration of data
collected from multiple sources. A typical example is the
concept of one-stop service, for example, a single health
portal provides a patient all of her/his health history, doctor’s
information, test results, appointment bookings, insurance,
and health reports. This concept involves information
sharing among multiple parties, for example, hospital,
drug store, and insurance company. On the other hand, the
general public, however, has growing concerns about the
use of personal information. Samarati (2001) shows that
linking two data sources may lead to unexpectedly reveal-
ing sensitive information of individuals. In response, new
privacy acts are enforced in many countries. For example,
Canada launched the Personal Information Protection and
Electronic Document Act in 2001 to protect a wide spectrum
of information (The House of Commons in Canada, 2000).
Consequently, companies cannot indiscriminately share their
private information with other parties.
A data portal provides a single access point for Web clients
to retrieve data. Also, it serves a logical point to determine
the trade-off between information sharing and privacy pro-
tection. Can the two goals be achieved simultaneously? This
chapter formalizes this question to a problem called secure
portals integration for classifcation and presents a solution
for it. Consider the model in Figure 1. A hospital A and an
insurance company B own different sets of attributes about
the same set of individuals identifed by a common key. They
want to share their data via their data portals and present
an integrated version in a Web portal to support decision
making, such as credit limit or insurance policy approval,
while satisfying two privacy requirements:
1. The fnal integrated table has to satisfy the k-anonymity
requirement, that is, given a specifed set of attributes
called a quasi-identifer (QID), each value of the QID
must be shared by at least k records in the integrated
table (Dalenius, 1986).
2. No party can learn more detailed information from
another party other than those in the fnal integrated
table during the process of generalization.
Simply joining their data at raw level (e.g., birthday and
city) may violate the k-anonymity requirement. Therefore,
data portals have to cooperate to determine a generalized
version of integrated data (e.g., birth year and province) such
that the generalized table remains useful for classifcation
analysis, such as insurance plan approval. Let us frst review
some building blocks in the literature. Then we elaborate an
algorithm, called top-down specialization for 2-party (Wang,
Fung, & Dong, 2005), that studies the problem.
bacKground
Privacy-preserving data mining is a study of performing a
data-mining task, such as classifcation, association, and
clustering, without violating some given privacy require-
ment. Recently, this topic has gained enormous attention
Figure 1. Secure portals integration for classifcation
Generalized
data
Data
Private
DB
Private
DB
Private
DB
Private
DB
Data
Integrated Web Portal
(for classification analysis)
Party A Party B