26
The Semantic Web: Prospects and Challenges
Michael Wilson and Brian Matthews
CCLRC Rutherford Appleton Laboratory
Chilton, Oxon UK
Abstract—The Web grew in five years from a development
project to a global business. In contrast the semantic Web has
spent ten years developing from a plan to introduce metadata to
the Web to a suite of technologies that are used in niche
markets, but are far from the global commodity business of the
Web. There remain fundamental problems in implementing the
vision of a semantic Web, which require both original technical
research and considerable consensus building to reach agreed
solutions. Many of the successes of the semantic Web are in
small technologies such as RSS, Dublin Core and FOAF, while
the main thrust of research is in big technologies such as
ontological modeling and inference engines. The links between
the small and large, as well as an understanding of the resulting
benefits are required to move the semantic Web into the
mainstream Web.
I. INTRODUCTION
The Web grew from a brief proposal to a global business in
five years, while the semantic Web has spent ten years
developing to its current apparently backwater state – why is
this? The paper presents the background and current state of
the semantic Web, leading to some of the outstanding research
issues that remain before the full vision can be realized.
A. A Brief History of the Semantic Web
In 1989 Tim Berners-Lee proposed the World Wide Web to
CERN as a development project [1]. By 1991 there was a
portable browser available and being distributed. By 1994
Netscape was flourishing providing a commercial browser,
Yahoo! had been created to provide an index, the WebCrawler
was running as a search engine and there were 2,500 servers
worldwide. By the end of 1995 there were approximately
73,500 Web servers worldwide, Microsoft had released
Internet Explorer and W3C had been established as a
standards body for the Web.
In contrast, the semantic Web was initiated in 1996 when it
was acknowledged that the Web was built for human
consumption, and although everything on it is machine-
readable, this data is not machine-understandable. It is very
hard to automate anything on the Web, and because of the
volume of information the Web contains, it is not possible to
manage it manually. The solution proposed was to use
metadata to describe the data contained on the Web. Metadata
is “data about data” (for example, a library catalog is
metadata, since it describes publications) or specifically “data
describing Web resources”. The first working draft of the
RDF language to define metadata was available in August
1997, but it was not until February 1999 that it appeared as an
agreed W3C recommendation. In 1998 Tim Berners-Lee
published a roadmap to the Semantic Web [2] that introduced
notions beyond metadata including query languages,
inference rules and proof validation. A vision of the Semantic
Web from 2001 [3] has broadened the vision further to
include trust, where: “The Semantic Web will bring structure
to the meaningful content of Web pages, creating an
environment where software agents roaming from page to
page can readily carry out sophisticated tasks for users”. The
Web had one clear vision to which a solution was generated,
while these changes in vision for the semantic Web from a
simple metadata proposal to an agent environment are partly
to blame for the delays in creating a solution since the
requirements on the technologies have both changed, and
grown increasingly complex. But there are fundamental
technical causes too.
B. The Semantic Web Architecture and Technologies
The architecture of the semantic Web was proposed by
Berners-Lee in 2001 [3] as a layered pyramid. Unicode and
URI as foundations ensure that the technologies are applicable
to all languages and to allow all objects to be referred to by
unique identifiers. This allows the semantic Web to be
maintained as a single discourse context in which one speaker
can comment on the information presented by themselves or
others (e.g. Dan said, that Tim said “Each person should have
a URI”). The information is presented as XML, while the
metadata and comments are made in RDF. RDF has been
constructed as a fully reified language – again to allow
comments to be treated as objects, so that they can be referred
to and commented on by others. These properties of unique
universal identifiers and reification [8] have been shown in
natural language systems to be essential components of an
architecture to support reference within a discourse (anaphora)
and to objects in the world (diexis) [4]. The security provided
by digital signatures is integral to the semantic Web
technologies in order to reliably identify who is making a
statement.
The layer on top of RDF is that of the ontology language
OWL, which adds more vocabulary than RDF for describing
properties and classes: among others, relations between
classes (e.g. disjointness), cardinality (e.g. "exactly one"),
equality, richer typing of properties, characteristics of
properties (e. g. symmetry), and enumerated classes. In
1-4244-0345-6/06/$20.00 ©2006 IEEE