Personalizing PageRank Based on Domain Profiles Mehmet S. Aktas, Mehmet A. Nacar, and Filippo Menczer Computer Science Department Indiana University Bloomington, IN 47405 USA {maktas,mnacar,fil}@indiana.edu Abstract Personalized versions of PageRank have been proposed to rank the results of a search engine based on a user’s topic or query of interest. This paper introduces a methodology for personalizing PageRank vectors based on URL features such as Internet domains. Users specify interest profiles as binary feature vectors where a feature corresponds to a DNS tree node. Given a profile vector, a weighted PageRank can be computed assigning a weight to each URL based on the match between the URL and the profile features. We present promising preliminary results from a small experiment in which users were allowed to select among nine URL features combining the top two levels of the DNS tree, leading to 2 9 pre-computed PageRank vectors from a Yahoo crawl. Personalized PageRank performed favorably compared to pure similarity based ranking and traditional PageRank. Key Words: personalized search, link analysis, PageRank, internet domain profiles, web search, personal- ized PageRank vectors 1. Introduction The Web is a highly distributed and heterogeneous information environment. The immense number of Web documents presents various challenges for search engines. Storage space, crawling speed, and computational speed are some of these challenges. This paper deals with the retrieval of the most relevant documents. Recent search engines rank pages by combining traditional information retrieval techniques based on page content, such as the word vector space [1, 2], with link analysis techniques based on the hypertext structure of the Web, such as PageRank [3] and HITS [4]. The PageRank algorithm provides a global ranking of Web pages based on their importance estimated from hyperlinks [5, 3, 6]. For instance, a link from page “A” to page “B” is considered as if page “A” is voting for the importance of page “B”. So, as the number of links to page “B” increases, its importance increases as well. In PageRank, not only the number of inlinks but their sources decide the importance of a page. In this scenario the global ranking of pages is based on the Web graph structure. Search engines such as Google 1 utilize the link structure of the Web to calculate the PageRank values of the pages. These values are then used to rank search results to improve precision. Comprehensive reviews of the issues related to PageRank can be found in [7, 8, 9]. The PageRank algorithm [5, 3] attempts to provide an objective global estimate of Web page importance. However, the importance of Web pages is subjective for different users and thus can be better determined if the PageRank algorithm takes into consideration user preferences. The importance of a page depends of the different interests and knowledge of different people; a global ranking of a Web page might not necessarily capture the importance of that page for a given individual user. Here we explore how to personalize PageRank based on features readily available from page URLs. For instance a user might favor pages from a specific geographic region, as may be revealed by Internet (DNS) domains. Likewise, topical features of Internet domains might also reflect user preferences. A user might prefer pages that are more likely to be monitored 1 http://www.google.com