The Effect of PageRank on the Collaborative Filtering Recommendation of Journal Articles Andr´ e Vellino CISTI, National Research Council Ottawa, Ontario K1A 0R6 andre.vellino@nrc.ca Abstract The cold-start problem for collaborative filtering recom- mendation in a scholarly digital library can be partially addressed by seeding the matrix of user-preferences with article references obtained from the citation graph. This approach has several limitations, not least of which is that an article’s references are boolean ratings rather than rat- ings on a numerical scale. In this paper we examine the hypothesis that the PageRank score of a reference can be used as a proxy for the user-rating of the citing article and describe experimental tests that demonstrate this hypothesis to be false. 1 Introduction Recommending journal articles in a digital library is more difficult than recommending other kinds of items (songs, movies, merchandise, etc.) in part because of the data sparsity problem. In scholarly digital libraries, the ra- tio of users to items is typically one or two orders of magni- tude smaller than for recommenders in commercial settings [11]. In addition the average number of ratings per user in a digital library is likely to be much smaller than for content on a commercial web site, thus exacerbating the problem. One remedy for this problem is to use bibliographic ci- tations as a substitute for user ratings [9]. However, this solution is partial at best. One reason is that bibliographic references, while an indicator of relevance, are not neces- sarily an indication of favourable relevance in the mind of the author. 1 Another reason is that references only provide boolean ratings rather than ratings on a numerical scale, as is usually done with movies and music. One possible method for assigning a numerical rating to 1 One way to distinguish between favourable and unfavourable refer- ences would be to assess the semantic orientation of words that neighour the reference in the text using latent semantic analysis techniques [10]. a bibliographic citation in an article is to assign to it the PageRank value of the article obtained from the citation net- work of all the articles. There are several reasons for expecting that ratings de- fined by PageRank values would improve collaborative fil- tering recommendations. First, the use of PageRank for evaluating the impact of scholarly journals as a whole has successfully been applied by Eigenfactor [2], so it seemed likely that applying this technique to individual articles would also be effective. Second, previous successes in the use of PageRank as a measure of the “impact factor” of an article [5, 6] suggests that PageRank weights could be a proxy for the numeric rating that an article might give to another article. Finally, other studies have shown [7] that collaborative filtering data can be used by a PageRank al- gorithm to improve rankings in search results. One might expect the converse effect – applying PageRank data to a collaborative filtering algorithm – could yield improved rec- ommendations. This paper describes the effect of applying a simplified Weighted PageRank algorithm on a citation graph and us- ing the resulting rankings as preference-scores from which to generate item-based recommendations. Experimental re- sults (section 4.1) show that PageRank significantly de- creases the quality of recommendations based on Top-N measures. In section 5 we discuss some of the possible rea- sons for this counter-intuitive result. 2 PageRank on Citations Consider the graph of references to articles in a collec- tion as the raw data to which the PageRank algorithm is applied. For the purposes of this study we ignored co- authorship information (although it has recently been shown that including information about co-authorship enhances the PageRank values [4]) and used only the citation net- work to establish a PageRank value for each article with a weighted PageRank algorithm [12].