1 Improving Web Site Search Using Web Server Logs Jin Zhou 1 Chen Ding 2 Dimitrios Androutsos 3 1,3 Department of Electrical and Computer Engineering, Ryerson University 2 Department of Computer Science, Ryerson University 350 Victoria St., Toronto, ON, Canada M5B 2K3 {j3zhou, cding, dimitri}@ryerson.ca Abstract Despite the success of global search engines, web site search engines are still suffering from poor performance. Since a web site is different from the whole web in link structure, access pattern, and data scale, it is not always successful when the methods which improve the performance of web search are applied to web site search. In this paper, we propose a novel algorithm to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and the relationships of web pages in the session are analyzed based on their similarities. Then, a new web page representation is generated. Anchor text is used to create another representa- tion. They are combined with original text-based representation in web site search. Two kinds of combination methods are investigated and tested: combination of document representations and combination of ranking scores. Our experimental results show that our algorithm can improve the retrieval accuracy for the four retrieval models we tested: Inference Network Model, Okapi Model, Cosine Similarity Model and TFIDF Model. The highest performance increase from web log analy- sis is from TFIDF model, and overall, inference Copyright 2006 Jin Zhou, Chen Ding, Dimitrios Androutsos and Ryerson University. Permission to copy is hereby granted provided the original copyright notice is reproduced in copies made. network model with web log information achieves the best result. 1 Introduction The World Wide Web has permeated our daily life and has changed the way we think, work and live. In recent years, search engines such as Google [2], Yahoo! [6] and MSN [5], have been a great help for users to retrieve their desired infor- mation from the growing web. After people view the results returned from search engines, they usually click some URLs to visit those web sites, and once at a site, they might find information they want immediately, or they might browse through hyperlinks or conduct a web site search to further their information seeking tasks. For exam- ple, a student might search course-related infor- mation on a departmental web site; a potential customer of Amazon might search for a specific book or DVD on the Amazon website. Although web site search is frequently used, users often suffer from its performance deficiency. When a query is submitted, a lot of irrelevant re- sults are returned or the most relevant pages are listed out of the top 10 results. In the Forrester survey [20], the search facilities of 50 websites were tested, but none of them obtained a satisfac- tory result according to the query. For example, the most relevant pages were rarely put in the first page of results, the best matched web pages