26 The Semantic Web: Prospects and Challenges Michael Wilson and Brian Matthews CCLRC Rutherford Appleton Laboratory Chilton, Oxon UK Abstract—The Web grew in five years from a development project to a global business. In contrast the semantic Web has spent ten years developing from a plan to introduce metadata to the Web to a suite of technologies that are used in niche markets, but are far from the global commodity business of the Web. There remain fundamental problems in implementing the vision of a semantic Web, which require both original technical research and considerable consensus building to reach agreed solutions. Many of the successes of the semantic Web are in small technologies such as RSS, Dublin Core and FOAF, while the main thrust of research is in big technologies such as ontological modeling and inference engines. The links between the small and large, as well as an understanding of the resulting benefits are required to move the semantic Web into the mainstream Web. I. INTRODUCTION The Web grew from a brief proposal to a global business in five years, while the semantic Web has spent ten years developing to its current apparently backwater state – why is this? The paper presents the background and current state of the semantic Web, leading to some of the outstanding research issues that remain before the full vision can be realized. A. A Brief History of the Semantic Web In 1989 Tim Berners-Lee proposed the World Wide Web to CERN as a development project [1]. By 1991 there was a portable browser available and being distributed. By 1994 Netscape was flourishing providing a commercial browser, Yahoo! had been created to provide an index, the WebCrawler was running as a search engine and there were 2,500 servers worldwide. By the end of 1995 there were approximately 73,500 Web servers worldwide, Microsoft had released Internet Explorer and W3C had been established as a standards body for the Web. In contrast, the semantic Web was initiated in 1996 when it was acknowledged that the Web was built for human consumption, and although everything on it is machine- readable, this data is not machine-understandable. It is very hard to automate anything on the Web, and because of the volume of information the Web contains, it is not possible to manage it manually. The solution proposed was to use metadata to describe the data contained on the Web. Metadata is “data about data” (for example, a library catalog is metadata, since it describes publications) or specifically “data describing Web resources”. The first working draft of the RDF language to define metadata was available in August 1997, but it was not until February 1999 that it appeared as an agreed W3C recommendation. In 1998 Tim Berners-Lee published a roadmap to the Semantic Web [2] that introduced notions beyond metadata including query languages, inference rules and proof validation. A vision of the Semantic Web from 2001 [3] has broadened the vision further to include trust, where: “The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users”. The Web had one clear vision to which a solution was generated, while these changes in vision for the semantic Web from a simple metadata proposal to an agent environment are partly to blame for the delays in creating a solution since the requirements on the technologies have both changed, and grown increasingly complex. But there are fundamental technical causes too. B. The Semantic Web Architecture and Technologies The architecture of the semantic Web was proposed by Berners-Lee in 2001 [3] as a layered pyramid. Unicode and URI as foundations ensure that the technologies are applicable to all languages and to allow all objects to be referred to by unique identifiers. This allows the semantic Web to be maintained as a single discourse context in which one speaker can comment on the information presented by themselves or others (e.g. Dan said, that Tim said “Each person should have a URI”). The information is presented as XML, while the metadata and comments are made in RDF. RDF has been constructed as a fully reified language – again to allow comments to be treated as objects, so that they can be referred to and commented on by others. These properties of unique universal identifiers and reification [8] have been shown in natural language systems to be essential components of an architecture to support reference within a discourse (anaphora) and to objects in the world (diexis) [4]. The security provided by digital signatures is integral to the semantic Web technologies in order to reliably identify who is making a statement. The layer on top of RDF is that of the ontology language OWL, which adds more vocabulary than RDF for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e. g. symmetry), and enumerated classes. In 1-4244-0345-6/06/$20.00 ©2006 IEEE