An Implicit-Feedback Based Ranking Methodology For Web Search Engines Shahram Rahimi, Raheel Ahmad, Bidyut Gupta, Kaushik Adya Department of Computer Science, Southern Illinois University Carbondale, IL 62901 [rahimi, rahmad, bidyut]@cs.siu.edu Tel: 618-453-6033 Abstract The World Wide Web (WWW) is a fast growing network of information covering nearly every possible topic. With the input of a few keywords a search engine can return a list of relevant web pages by querying its index. However, it is quite common to witness irrelevant results being presented to the user. One way to improve the ordering of the search results is by incorporating user feedback in ranking the documents for relevancy. We present a model for search engine enhancement by using implicit feedback in the form of ClickThrough data from the users. The order of the links returned as the query result is re-arranged for the future queries based on the choices made by the majority of the users. An algorithm, with its implementation, is presented and then evaluated to demonstrate its capability as an add-on component for enhancement of the current ranking algorithms. Keywords: Search Engine, Ranking Algorithms, Clickthrough Data, Implicit Feedback. 1 Introduction The task of ranking the documents, according to some predefined criteria, falls under the responsibilities of the ranking algorithms. A ranking algorithm is one of the most crucial components of any search engine and usually requires much attention during the engine’s development. Different search engines use different classes of ranking algorithms with varying degree of effectiveness and efficiency. Intuitively, a good information retrieval system should present relevant documents higher in the ranking, with less relevant documents following them. Although the ranking algorithms, following the search process, strive hard to achieve this goal, it is common to witness many irrelevant among the relevant queried information. This sub-optimal result has led to several researches in the area of search engine ranking algorithms. This work presents a system for automatically rearranging the query results of an arbitrary search engine using the implicit feedback obtained from users in the form of what is known as “ClickThrough” data. Such ClickThrough data is easily available and can be recorded at a very low cost. In the following sections, we discuss the inadequacies with the current ranking approaches, the proposed approach along with some implementation details, and the results. This will be followed by the evaluation of the implementation and conclusions. 2 Document Ranking Ranking algorithm is one of the crucial components of any search engine and plays an important role in its effectiveness. The satisfaction of the user lies on the links to the documents returned by this ranking algorithm. Different search engines use different ranking algorithms and most of them are proprietary and a well-kept secret. These algorithms include certain assumptions in order to rank the documents. Google, one of the most popular search engines, uses its own PageRank [3] algorithm. The most crucial aspect of the PageRank algorithm is that it interprets a link from page A to page B as a vote, by page A, for page B [4]. Along with this many other criteria are combined for the ranking procedure. TF x IDF is another ranking methodology used by search engines such as WebCrawler and Lycos [5]. This makes use of the term frequency (TF) in a document and how often the term is used in the collection of documents (IDF). Some of the other ranking methodologies are Boolean Spread Activation, Most-Cited and Vector Spreading Activation discussed in [6]. 2.1 Drawbacks of Current Approaches As seen in the above algorithms, the importance of a page is based on the metrics such as interest, popularity, location, etc. In almost all the ranking algorithms, the facts considered are the linkage, keywords, format of the words, position of the words, depth of the page in the domain, etc. In general the search engine user should be able to garner the information from the top links returned. But the user may not always find the most relevant links among the top few results. This situation occurs