Secure Two-party Rank Correlation Computations for Recommender Systems Kok-Seng Wong School of Computer Science and Engineering Soongsil University Seoul, South Korea kswong@ssu.ac.kr Minjie Seo School of Computer Science and Engineering Soongsil University Seoul, South Korea porito@ssu.ac.kr Myung Ho Kim School of Software Soongsil University Seoul, South Korea kmh@ssu.ac.kr Abstract—Recommendation systems are active information filtering systems that consist of a processor that can provide recommendations to requesting users (based on the personal ratings that were submitted by all users). In order to produce accurate and personalized recommendations, databases from different agencies can be merged together as a central database. However, due to competition and the possibility of disclosing business strategies, some agencies might not want to disclose the rating information of their customers. In this paper, we propose three secure protocols to compute rank correlation coefficients (Spearman’s Rho and Kendall’s Tau) for recommender systems. We utilize a semantically secure homomorphic cryptosystem and a ciphertext comparison approach in our protocol design. Keywords—rank correlation coefficient; recommender systems; data privacy; homomorphic cryptosystem;ciphertext comparison; I. INTRODUCTION Recommender systems play an important role in many industries, especially in e-commerce related. In general, the goal of any recommender system is to generate some useful recommendations to a group of users for products that might interest them. Often, a recommendation is made based on the user’s preferences to those of the other users. Information about the user’s preferences can be gathered from the user’s profile or from the observation of the user’s behavior (i.e., actions like click logs). In most of the recommender systems, correlation coefficient is used to measure the association between ranking datasets or profiles of different users. For example, a company can give a pairs of newly designed products to several customers to assess how good the products are. Each customer is asked to rate the products by answering a set of questionnaire. The rating is based on the customer’s satisfaction level (e.g., good, average, bad). Based on the rated information, the company can identify the relationship between products and also determine the profiles of the users who have the similar rating. A. Recommender Systems There are several approaches have been used to design recommender systems. For instance, collaborative filtering is one of the widely used approaches to predict items that the user may have an interest in [1]. Collaborative filtering systems are based on the ratings of the target user and other users in the system. In another word, the rating of user for a new item is likely to be similar to that of another user , if both and have rated other items in a similar fashion [2, 3]. Collaborative filtering approach is able to recommend items with different contents to its target user if other users have shown interest for those items. Another category of recommender systems uses content- based filtering approach to recommend new items with similar characteristics [4, 5]. Content-based filtering systems first identify the common characteristics of item that already received a rating from the target user . Next, it will recommend to similar items that share the common characteristics of . Often, this approach is built on the assumption that item of interest for can be predicted from ’s past interest. Unlike collaborative filtering approach, content-based filtering requires rich information (usually text documents) that describes an item . This limitation causes items with insufficient information cannot be recommended to . Other approaches such as knowledge-based and demographic filtering are also used in the recommender systems. Recently, some hybrid approaches (e.g., combining collaborative filtering and content-based filtering) have been proposed in the literature [6-8]. The key idea of hybrid systems is to combine the advantages of different recommendation systems in order to overcome their limitations. These hybrid systems demonstrate the improvement of the recommender systems in terms of effectiveness and accuracy in giving recommendation [9]. A comprehensive review about different recommendation systems can be found in [10]. B. Motivation and Problem Formulation In 2006, a famous on-line movies renting service provider (Netflix) starts a $1 million contest for the best technique to improve its movie recommendation system. Netflix publicly released 100 million records, showing the ratings given by 500,000 users to the movies they rent. The released records were anonymized by replacing the usernames with unique identification numbers. According to the study in [11], more than 90% of the subscribers could be uniquely identified from This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2058695) 2015 IEEE Trustcom/BigDataSE/ISPA 978-1-4673-7952-6/15 $31.00 © 2015 IEEE DOI 10.1109/Trustcom-BigDataSe-ISPA.2015.478 1022