UPR: Usage-based Page Ranking for Web Personalization 1 1 A modified version of this paper has appeared in the Proceedings of the 5 th IEEE International Conference on Data Mining (ICDM ’05) Magdalini Eirinaki Athens University of Economics and Business Dept. of Informatics Athens, Greece eirinaki@aueb.gr Michalis Vazirgiannis Athens University of Economics and Business Dept. of Informatics Athens, Greece mvazirg@aueb.gr ABSTRACT Recommendation algorithms aim at proposing “next” pages to a user based on her navigational behavior. In the vast majority of related algorithms, only the usage data are used to produce recommendations. We claim that taking also into account the web structure and using link analysis algorithms ameliorates the quality of recommendations. In this paper we present UPR, a personalization algorithm which combines usage data and link analysis techniques for ranking and recommending web pages to the end user. Using the web site’s structure and previously recorded user sessions we produce personalized navigational sub- graphs (prNGs) to be used for applying UPR. Experimental results show that the accuracy of the generated recommendations is superior to pure usage-based approaches. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications – Data Mining; H.3.5 [Information Storage and Retrieval]: Online Information Services - Web-based services General Terms Algorithms, Experimentation. Keywords Web Personalization, Link Analysis, PageRank, Usage-based PageRank 1. INTRODUCTION The evolution of world wide web as the main information source for millions of people nowadays has imposed the need for new methods and algorithms that are able to process efficiently the vast amounts of data that reside on it. Users become more and more demanding in terms of the quality of information provided to them when searching the web or browsing a web site. The area of web mining, including any method that utilizes data residing on the web, namely usage, content and structure data, addresses this need. The most common applications involve the ranking of the results of a web search engine and the provision of recommendations to users of – usually commercial – web sites, known as web personalization. PageRank is the most popular link analysis algorithm, used in order to rank the results returned by a search engine after a user query. The ranking is performed by evaluating the importance of a page in terms of its connectivity to and from other important pages. In the past there have been proposed many variations of this algorithm, aiming at refining the acquired results. Some of these approaches, make use of the so called “personalization vector” of PageRank in order to bias the results towards the individual needs of every user searching the web. In this work, we introduce PageRank in a totally different context, that of web personalization. Web personalization is defined as any action that adapts the information or services provided by a Web site to the needs of a user or a set of users, taking advantage of the knowledge gained from the users’ navigational behavior and individual interests, in combination with the content and the structure of the Web site [10]. In the past, many approaches have been proposed, based on pure web usage mining algorithms, markov models, or a combination of usage and content mining techniques. Motivated by the fact that in the context of navigating a web site, a page/path is important if many users have visited it before, we propose a novel approach that is based on a personalized version of PageRank, applied to the navigational tree created by the previous users’ navigations. We personalize PageRank by biasing it to “favor” pages and paths previously preferred by many users. We prove that this hybrid algorithm can be applied to any web site’s navigational graph as long as it satisfies certain properties. Thus, it is orthogonal to any graph synopsis we may choose to model the user sessions with, such as a Markov Chain, higher- order Markov models, tree-like synopses, etc. This approach is therefore generic, proved to converge after a few iterations and thus provides fast results, whereas we can fluctuate between simplicity and accuracy by applying it to less or more complex web navigational graph models. More specifically, our key contributions are: UPR, a novel usage-based personalized PageRank-style algorithm used for ranking the web pages of a site based on previous users’ navigational behavior. A method for creating personalized navigational graph synopses (prNG) to be used for applying UPR. A personalization method which combines usage and structure data for ranking and recommending web pages to the end user. A set of experimental results which prove that the incorporation of link analysis in the web personalization process improves the recommendations’ accuracy.