A Personalizable Agent for Semantic Taxonomy-Based Web Search Larry Kerschberg, Wooju Kim, and Anthony Scime E-Center for E-Business, George Mason University 4400 University Drive, Fairfax, VA 22030, USA kersch@gmu.edu http://eceb.gmu.edu/ Department of Industrial Engineering, Chunbuk National University, Korea wjkim@chonbuk.ac.kr Department of Computer Science, SUNY-Brockport ascime@brockport.edu Abstract. This paper addresses the problem of specifying Web searches and retrieving, filtering, and rating Web pages so as to improve the relevance and quality of hits, based on the user’s search intent and preferences. We present a methodology and architecture for an agent-based system, called WebSifter II, that captures the semantics of a user’s decision-oriented search intent, transforms the semantic query into target queries for existing search engines, and then ranks the resulting page hits according to a user-specified weighted- rating scheme. Users create personalized search taxonomies via our Weighted Semantic-Taxonomy Tree. Consulting a Web taxonomy agent such as WordNet helps refine the terms in the tree. The concepts represented in the tree are then transformed into a collection of queries processed by existing search engines. Each returned page is rated according to user-specified preferences such as semantic relevance, syntactic relevance, categorical match, page popularity and authority/hub rating. 1 Introduction With the advent of Internet and WWW, the amount of information available on the Web grows daily. However, having too much information at one’s fingertips does not always mean good quality information, in fact, it may often prevent a decision maker from making sound decisions, by degrading the quality of the decision. Helping decision makers to locate relevant information in an efficient manner is very important both to the person and to an organization in terms of time, cost, data quality and risk management. Although search engines assist users in finding information, many of the results are irrelevant to the decision problem. This is due in part, to the keyword search approach, which does not capture the user’s intent, what we call meta-knowledge. Another reason for irrelevant results from search engines is a “semantic gap” between the meanings of terms used by the user and those recognized by the search engines. In