A Website Mining Model Centered on User Queries Ricardo Baeza-Yates 1,2,3 and Barbara Poblete 1,2 1 Web Research Group, Technology Department, University Pompeu Fabra, Barcelona, Spain 2 Center for Web Research, CS Department University of Chile, Santiago, Chile 3 Yahoo! Research, Barcelona, Spain {ricardo.baeza, barbara.poblete}@upf.edu Abstract. We present a model for mining user queries found within the access logs of a website and for relating this information to the website’s overall usage, structure and content. The aim of this model is to dis- cover, in a simple way, valuable information to improve the quality of the website, allowing the website to become more intuitive and adequate for the needs of its users. This model presents a methodology of analysis and classification of the different types of queries registered in the usage logs of a website, such as queries submitted by users to the site’s internal search engine and queries on global search engines that lead to docu- ments in the website. These queries provide useful information about topics that interest users visiting the website and the navigation pat- terns associated to these queries indicate whether or not the documents in the site satisfied the user’s needs at that moment. 1 Introduction The Web has been characterized by its rapid growth, massive usage and its ability to facilitate business transactions. This has created an increasing interest for improving and optimizing websites to fit better the needs of their visitors. It is more important than ever for a website to be found easily in the Web and for visitors to reach effortlessly the contents they are looking for. Failing to meet these goals can result in the loss of many potential clients. Web servers register important data about the usage of a website. This in- formation generally includes visitors navigational behavior, the queries made to the website’s internal search engine (if one is available) and also the queries on external search engines that resulted in requests of documents from the website, queries that account for a large portion of the visits of most sites on the Web. All of this information is provided by visitors implicitly and can hold the key to significantly optimize and enhance a website, thus improving the “quality” of that site, understood as “the conformance of the website’s structure to the intuition of each group of visitors accessing the site” [1]. Most of the queries related to a website represent actual information needs of the users that visit the site. However, user queries in Web mining have been M. Ackermann et al. (Eds.): EWMF/KDO 2005, LNAI 4289, pp. 1–17, 2006. c Springer-Verlag Berlin Heidelberg 2006