Random Walks in Recommender Systems: Exact Computation and Simulations ∗ Colin Cooper Sang Hyuk Lee Tomasz Radzik Yiannis Siantos Department Of Informatics King’s College London, U.K. name.surname@kcl.ac.uk ABSTRACT A recommender system uses information about known as- sociations between users and items to compute for a given user an ordered recommendation list of items which this user might be interested in acquiring. We consider ordering rules based on various parameters of random walks on the graph representing associations between users and items. We ex- perimentally compare the quality of recommendations and the required computational resources of two approaches: (i) calculate the exact values of the relevant random walk pa- rameters using matrix algebra; (ii) estimate these values by simulating random walks. In our experiments we include methods proposed by Fouss et al. [8, 9] and Gori and Pucci [11], method P 3 , which is based on the distribution of the random walk after three steps, and method P 3 α , which gener- alises P 3 . We show that the simple method P 3 can outper- form previous methods and method P 3 α can oﬀer further im- provements. We show that the time- and memory-eﬃciency of direct simulation of random walks allows application of these methods to large datasets. We use in our experiments the three MovieLens datasets. 1. INTRODUCTION We view a recommender system as an algorithm which takes a dataset of relationships between a set of users and a set of items and attempts to calculate how a given user might rank all items. For example, the users may be the customers who have bought books from some (online) bookstore and the items the books oﬀered. The core information in the dataset in this case would show who bought which books, but it may also include further details of transactions (the date of transaction, the books bought together, etc.), in- ∗ This research is part of the project “Fast Low Cost Meth- ods to Learn Structure of Large Networks,” supported by the 2012 SAMSUNG Global Research Outreach (GRO) pro- gram. formation about the books (authors, category, etc.), and possibly some details about customers (age, address, etc.). For a given customer, a recommender system would com- pute a list of books 〈m1,m2,...,m k 〉 which this customer might be interested in buying, giving the highest recommen- dations ﬁrst. Recommender systems viewed as algorithms for computing such personalised rankings of items (rather than overall “systems,” which would also include methods for gathering data) are often referred to as scoring , or rank- ing algorithms. Recommender systems are part of everyday online life. Whenever we buy a movie, or a new app for our mobile phone, a recommender system would suggest other items of potential interest to us. A good recommender system improves the user’s experience and increases commercial ac- tivity, while consistently unhelpful recommendations may make the users look for other sites. This signiﬁcant com- mercial value coupled with the challenging theoretical and practical aspects of modelling, designing and implementing appropriate algorithms, has made recommender systems a fast growing research topic. In this paper, we focus on a simple scenario with two main entity sets, Users (U ) and Items (I ), and a single relation- ship R of pairs 〈u, m〉, where u ∈ U and m ∈ I . The fact that 〈u, m〉∈ R means that u has some preference for m. The pairs 〈u, m〉∈ R may have additional attributes which indicate the degree of preference. The relationship R can be modeled as a bipartite graph G =(U ∪ I,R), possibly with edge weights, which would be calculated on the basis of the attributes of pairs 〈u, m〉∈ R. A scoring algorithm for a user u ∈ U orders the items in I according to some similarities between vertex u and the vertices in I , which are deﬁned by the structure of graph G. More precisely, a scoring algorithm is deﬁned by a formula or a procedure for calculating a p × q matrix M = M(G) which expresses those similarities, where p = |U | and q = |I |. For a user u ∈ U , the items m ∈ I are ranked according to their M(u, m) values (in increasing or decreasing order, depending whether a lower or a higher value M(u, m) indicates higher or lower similarity between vertices u and m). Matrix M is called the scoring matrix or the ranking matrix of the algorithm. We also note that the methodology of designing ranking algorithms of this type, which use the whole relationship between users and items rather than user and item proﬁles, is often referred to as