Automatic Ontology Identification for Reuse Mirco Speretta and Susan Gauch Electrical Engineering and Computer Science University of Kansas {mirco, sgauch}@ittc.ku.edu Abstract The increasing interest in the Semantic Web is producing a growing number of publicly available domain ontologies. These ontologies are a rich source of information that could be very helpful during the process of engineering other domain ontologies. We present an automatic technique that, given a set of Web documents, selects appropriate domain ontologies from a collection of pre-existing ontologies. We empirically compare an ontology match score that is based on statistical techniques with simple keyword matching algorithms. The algorithms were tested on a set of 183 publicly available ontologies and documents representing ten different domains. Our algorithm was able to select the correct domain ontology as the top ranked ontology 8 out of 10 times. 1. Introduction The increasing popularity of the Semantic Web has produced a proliferation of ontologies, attracting the interest of many researchers to develop libraries. Ding and Fensel [3] describe the benefits of organizing and reusing available ontologies into libraries. A library of ontologies should provide users with the possibility of re-using, maintaining, adapting and versioning ontologies. Despite the steady growth, the most common method for building ontologies is still based on manual effort. Ontology engineering employs a variety of different approaches to ontology construction and they are usually based on best practice guidelines. The importance of providing tools to users during the process of constructing ontologies is widely recognized, as shown by the development of projects such as portals and systems for searching, reusing, and distributing ontologies ([4] and [6]). Rather than starting from scratch for each domain, some projects are investigating reusing existing ontologies for even further efficiency improvements. Although there may be some modifications required, as more and more ontologies become available, it is increasingly likely that third party ontologies might exist that could be used unchanged or, changed at the least, to bootstrap the ontology creation process. Maedche et al. [6] describe in detail the challenges of building systems that reuse ontologies. The goal of our study is to introduce an automatic technique that can help to identify existing ontologies that would be good candidates for reuse. By automatically exploiting content extracted from sets of Web pages, we employ automatic techniques similar to those employed in ontology learning. On the other hand, rather than building ontologies from scratch by defining taxonomies and building structures, we focus on selecting from existing ontologies whose domain is related to the topic of a given collection of documents in order to bootstrap the ontology learning process. Our dataset is based on ontologies that we downloaded from publicly available online libraries. For each ontology, we considered only the list of included concepts. No properties or other relationships within the ontologies were taken in consideration in the scope of this study. We assumed that the tokens included in the concepts are a representation of the domain described by the ontology. Statistical weighting techniques were applied to identify the most representative tokens. 2. Background Velardi et al. [9] gave a comprehensive overview of the state-of-the-art approaches for constructing taxonomies. They also introduced a new semi- automatic technique for creating domain taxonomies. Lately many approaches for searching and reusing ontologies have been proposed. Alani et al. [1] developed AKTiveRank, a prototype system for searching ontologies. For Sabou et al. [8], ontology 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.79 419 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.79 419 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.79 419 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.79 419 2007 IEEE/WIC/ACM International Conference on Web Intelligence 0-7695-3026-5/07 $25.00 © 2007 IEEE DOI 10.1109/WI.2007.79 419