I.-Y. Song et al. (Eds.): ER 2003, LNCS 2813, pp. 476–489, 2003. © Springer-Verlag Berlin Heidelberg 2003 A Heuristic-Based Methodology for Semantic Augmentation of User Queries on the Web * Andrew Burton-Jones 1 , Veda C. Storey 1 , Vijayan Sugumaran 2 , and Sandeep Purao 3 1 J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302, {vstorey,abjones}@gsu.edu 2 School of Business Administration, Oakland University Rochester, MI 48309 sugumara@oakland.edu 3 School of Information Sciences & Technology, The Pennsylvania State University, University Park, PA 16801-3857 spurao@ist.psu.edu Abstract. As the World Wide Web continues to grow, so does the need for effective approaches to processing users’ queries that retrieve the most relevant information. Most search engines provide the user with many web pages, but at varying levels of relevancy. The Semantic Web has been proposed to retrieve and use more semantic information from the web. However, the capture and processing of semantic information is a difficult task because of the well-known problems that machines have with processing semantics. This research proposes a heuristic-based methodology for building context aware web queries. The methodology expands a user’s query to identify possible word senses and then makes the query more relevant by restricting it using relevant information from the WordNet lexicon and the DARPA DAML library of domain ontologies. The methodology is implemented in a prototype. Initial testing of the prototype and comparison to results obtained from Google show that this heuristic based approach to processing queries can provide more relevant results to users, especially when query terms are ambiguous and/or when the methodology’s heuristics are invoked. 1 Introduction It is increasingly difficult to retrieve relevant web pages for queries from the World Wide Web due to its rapid growth and lack of structure [31, 32]. In response, the Semantic Web has been proposed to extend the WWW by giving information well- defined meaning [3, 10]. The Semantic Web relies heavily on ontologies to provide taxonomies of domain specific terms and inference rules that serve as surrogates for semantics [3]. Berners-Lee et al. describe the Semantic Web as “not a new web but an extension of the current one, in which information is given well-defined meaning” [3]. Unfortunately, it is difficult to capture and represent meaning in machine- * This research was partially supported by J. Mack Robinson College of Business, Georgia State University and Office of Research & Graduate Study, Oakland University.