Modeling Collaborative Similarity with the Signed Resistance Distance Kernel J´ erˆ ome Kunegis, Stephan Schmidt, S ¸ahin Albayrak 1 , Christian Bauckhage 2 and Martin Mehlitz {kunegis, stephan.schmidt, sahin}@dai-lab.de, christian.bauckhage@telekom.de, martin.mehlitz@googlemail.com Abstract. We extend the resistance distance kernel to the domain of signed dissimilarity values, and show how it can be applied to collaborative rating prediction. The resistance distance is a graph kernel inspired by electrical network mod- els where edges of a graph are interpreted as electrical re- sistances. We model the similarity between users of a large collaborative rating database using this signed resistance dis- tance, generalizing the previously known regular resistance distance kernel which is limited to nonnegative values. We show that the signed resistance distance kernel can be com- puted eﬀectively using the Moore-Penrose pseudoinverse of the Laplacian matrix of the bipartite rating graph, leading to fast computation based on the eigenvalue decomposition of the Laplacian matrix. We apply this technique to collabora- tive rating prediction on the Netﬂix Prize corpus, and show how our new kernel can replace the traditional Pearson cor- relation for rating prediction. 1 Introduction In the ﬁeld of information retrieval, the ﬁltering and recom- mendation of items to users is usually done in a content-based manner, meaning that the content of items is analyzed in or- der to provide recommendations. Collaborative ﬁltering, by contrast, bases its item rankings on ratings collected from users of the recommendation system. A collaborative ﬁltering system usually consists of a database of users, items such as text documents or movies, and a collection of ratings users give to items. The collected database of ratings is usually sparse, as each single user has generally only rated a small part of all available items. To make recommendations, a collaborative ﬁltering system has to rank items. To rank items, a score has to be calculated for each item, based on the proﬁle of the user receiving the recommendation. These scores can be interpreted as rating predictions, meaning that the recommendation system will recommend items the user will probably like. Diﬀerent algorithms exist for predicting ratings, most based on the calculation of similarities between users, and some- times between items. In this paper, we describe a rating pre- diction algorithm using a new graph kernel based on the signed resistance distance. 1 DAI-Labor, Technische Universit¨ at Berlin, Germany 2 Telekom Laboratories, Berlin, Germany The signed resistance distance we deﬁne diﬀers from the regular resistance distance in the literature in that in can be applied to similarity measures taking on negative values as well, such as the Pearson correlation. The regular (unsigned) resistance distance can only be used with nonnegative values, and thus cannot be applied to rating prediction, as ratings take on negative values. Based on a known result about the regular resistance dis- tance, we show how the signed resistance distance can be com- puted eﬀectively using the Moore-Penrose pseudoinverse of the Laplacian of the correlation matrix. We evaluate our approach by comparing it to the Pearson correlation based prediction algorithm. The evaluation is per- formed on the Netﬂix Prize corpus. The reminder of this paper is organized as follows. Section 2 introduces the basic collaborative ﬁltering techniques In Sec- tion 3, we deﬁne the notation used in the paper. Section 4 describes the basic collaborative ﬁltering algorithm in detail. In Section 5, we deﬁne the regular (unsigned) resistance dis- tance. Section 6 motivates and deﬁnes the signed resistance distance, and presents a closed-form expression for it. Sec- tion 7 describes how the signed resistance distance can be ap- plied to rating prediction, giving a signed resistance distance- based algorithm. Section 8 presents the evaluation results, and Section 9 concludes the paper and gives future research directions. 2 Related Work In this section, we present the basic collaborative ﬁltering methods. Methods based on the resistance distance are intro- duced later. Collaborative ﬁltering systems appeared in the 1990s, with systems such as GroupLens [14], Ringo [15], MovieLens, etc. These systems all used a simple neighborhood ﬁnding and similarity-based rating prediction approach that consists in calculating a weighted average of known item ratings, taking the user-user correlation values as weights. Variants of these methods can be found in [1,6]. An overview of state-of-the-art collaborative prediction algorithms can be found in [7]. Collaborative ﬁltering algorithm are classiﬁed by various methods: • User-based approaches calculate similarities between users, and use these to weight known item ratings. Item-based