Automatic Ontology Identification for Reuse
Mirco Speretta and Susan Gauch
Electrical Engineering and Computer Science
University of Kansas
{mirco, sgauch}@ittc.ku.edu
Abstract
The increasing interest in the Semantic Web is
producing a growing number of publicly available
domain ontologies. These ontologies are a rich source
of information that could be very helpful during the
process of engineering other domain ontologies. We
present an automatic technique that, given a set of
Web documents, selects appropriate domain ontologies
from a collection of pre-existing ontologies. We
empirically compare an ontology match score that is
based on statistical techniques with simple keyword
matching algorithms. The algorithms were tested on a
set of 183 publicly available ontologies and documents
representing ten different domains. Our algorithm was
able to select the correct domain ontology as the top
ranked ontology 8 out of 10 times.
1. Introduction
The increasing popularity of the Semantic Web has
produced a proliferation of ontologies, attracting the
interest of many researchers to develop libraries. Ding
and Fensel [3] describe the benefits of organizing and
reusing available ontologies into libraries. A library of
ontologies should provide users with the possibility of
re-using, maintaining, adapting and versioning
ontologies.
Despite the steady growth, the most common
method for building ontologies is still based on manual
effort. Ontology engineering employs a variety of
different approaches to ontology construction and they
are usually based on best practice guidelines. The
importance of providing tools to users during the
process of constructing ontologies is widely
recognized, as shown by the development of projects
such as portals and systems for searching, reusing, and
distributing ontologies ([4] and [6]).
Rather than starting from scratch for each domain,
some projects are investigating reusing existing
ontologies for even further efficiency improvements.
Although there may be some modifications required, as
more and more ontologies become available, it is
increasingly likely that third party ontologies might
exist that could be used unchanged or, changed at the
least, to bootstrap the ontology creation process.
Maedche et al. [6] describe in detail the challenges of
building systems that reuse ontologies.
The goal of our study is to introduce an automatic
technique that can help to identify existing ontologies
that would be good candidates for reuse. By
automatically exploiting content extracted from sets of
Web pages, we employ automatic techniques similar to
those employed in ontology learning. On the other
hand, rather than building ontologies from scratch by
defining taxonomies and building structures, we focus
on selecting from existing ontologies whose domain is
related to the topic of a given collection of documents
in order to bootstrap the ontology learning process.
Our dataset is based on ontologies that we
downloaded from publicly available online libraries.
For each ontology, we considered only the list of
included concepts. No properties or other relationships
within the ontologies were taken in consideration in the
scope of this study. We assumed that the tokens
included in the concepts are a representation of the
domain described by the ontology. Statistical weighting
techniques were applied to identify the most
representative tokens.
2. Background
Velardi et al. [9] gave a comprehensive overview of the
state-of-the-art approaches for constructing
taxonomies. They also introduced a new semi-
automatic technique for creating domain taxonomies.
Lately many approaches for searching and reusing
ontologies have been proposed. Alani et al. [1]
developed AKTiveRank, a prototype system for
searching ontologies. For Sabou et al. [8], ontology
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.79
419
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.79
419
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.79
419
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.79
419
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.79
419