Secure Two-party Rank Correlation
Computations for Recommender Systems
Kok-Seng Wong
School of Computer Science and
Engineering
Soongsil University
Seoul, South Korea
kswong@ssu.ac.kr
Minjie Seo
School of Computer Science and
Engineering
Soongsil University
Seoul, South Korea
porito@ssu.ac.kr
Myung Ho Kim
School of Software
Soongsil University
Seoul, South Korea
kmh@ssu.ac.kr
Abstract—Recommendation systems are active information
filtering systems that consist of a processor that can provide
recommendations to requesting users (based on the personal
ratings that were submitted by all users). In order to produce
accurate and personalized recommendations, databases from
different agencies can be merged together as a central database.
However, due to competition and the possibility of disclosing
business strategies, some agencies might not want to disclose the
rating information of their customers. In this paper, we propose
three secure protocols to compute rank correlation coefficients
(Spearman’s Rho and Kendall’s Tau) for recommender systems.
We utilize a semantically secure homomorphic cryptosystem and
a ciphertext comparison approach in our protocol design.
Keywords—rank correlation coefficient; recommender systems;
data privacy; homomorphic cryptosystem;ciphertext comparison;
I. INTRODUCTION
Recommender systems play an important role in many
industries, especially in e-commerce related. In general, the
goal of any recommender system is to generate some useful
recommendations to a group of users for products that might
interest them. Often, a recommendation is made based on the
user’s preferences to those of the other users. Information
about the user’s preferences can be gathered from the user’s
profile or from the observation of the user’s behavior (i.e.,
actions like click logs).
In most of the recommender systems, correlation
coefficient is used to measure the association between ranking
datasets or profiles of different users. For example, a company
can give a pairs of newly designed products to several
customers to assess how good the products are. Each customer
is asked to rate the products by answering a set of questionnaire.
The rating is based on the customer’s satisfaction level (e.g.,
good, average, bad). Based on the rated information, the
company can identify the relationship between products and
also determine the profiles of the users who have the similar
rating.
A. Recommender Systems
There are several approaches have been used to design
recommender systems. For instance, collaborative filtering is
one of the widely used approaches to predict items that the user
may have an interest in [1]. Collaborative filtering systems are
based on the ratings of the target user and other users in the
system. In another word, the rating of user
for a new item
is likely to be similar to that of another user
, if both
and
have rated other items in a similar fashion [2, 3].
Collaborative filtering approach is able to recommend items
with different contents to its target user if other users have
shown interest for those items.
Another category of recommender systems uses content-
based filtering approach to recommend new items with similar
characteristics [4, 5]. Content-based filtering systems first
identify the common characteristics of item that already
received a rating from the target user . Next, it will
recommend to similar items that share the common
characteristics of . Often, this approach is built on the
assumption that item of interest for can be predicted from
’s past interest. Unlike collaborative filtering approach,
content-based filtering requires rich information (usually text
documents) that describes an item . This limitation causes
items with insufficient information cannot be recommended to
.
Other approaches such as knowledge-based and
demographic filtering are also used in the recommender
systems. Recently, some hybrid approaches (e.g., combining
collaborative filtering and content-based filtering) have been
proposed in the literature [6-8]. The key idea of hybrid systems
is to combine the advantages of different recommendation
systems in order to overcome their limitations. These hybrid
systems demonstrate the improvement of the recommender
systems in terms of effectiveness and accuracy in giving
recommendation [9]. A comprehensive review about different
recommendation systems can be found in [10].
B. Motivation and Problem Formulation
In 2006, a famous on-line movies renting service provider
(Netflix) starts a $1 million contest for the best technique to
improve its movie recommendation system. Netflix publicly
released 100 million records, showing the ratings given by
500,000 users to the movies they rent. The released records
were anonymized by replacing the usernames with unique
identification numbers. According to the study in [11], more
than 90% of the subscribers could be uniquely identified from
This research was supported by Basic Science Research Program through the
National Research Foundation of Korea (NRF) funded by the Ministry of
Education (NRF-2014R1A1A2058695)
2015 IEEE Trustcom/BigDataSE/ISPA
978-1-4673-7952-6/15 $31.00 © 2015 IEEE
DOI 10.1109/Trustcom-BigDataSe-ISPA.2015.478
1022