[to appear] Chapter Manuscript for Creating the Semantic Web, D. Fensel, J. Hendler, H. Liebermann, and W. Wahlster (eds.), MIT Press, 2001. 1 Complex Relationships for the Semantic Web Sanjeev Thacker, Amit Sheth, and , Shuchi Patel Large Scale Distributed Information Systems (LSDIS) Lab Department of Computer Science, University of Georgia Athens, GA 30602 USA http://lsdis.cs.uga.edu Email: {sanjeev, amit, shuchi}@cs.uga.edu 1 Introduction Relationships are fundamental to supporting semantics [Wie97, She96], and hence to the Semantic Web [LHL, FM01]. Till date, focus has been on simple relationships such as is- a and is-part-of, as in DAML/OIL [Ont]. In this work, we adapt our earlier work on MREF [SS98] to develop a framework for supporting complex relationships. A framework to manage complex relationships as discussed here becomes the basis for knowledge discovery from the information interlinked by the Semantic Web. Our work primarily builds upon earlier research in integrating information systems that has also been applied to exploiting web-accessible distributed across heterogeneous information sources. Primary focus of information integration systems has been to model these diverse data sources and integrate the data by resolving the heterogeneity involved to provide global views of domains (one point access) for querying. We shift the focus from modeling of the information sources for purpose of querying to extracting useful knowledge from these information sources. This, we believe, can be achieved by modeling the complex relationships among the domains to study and explore the interaction that exists between them. In addition to information source and relationship modeling, operations are also modeled as part of the knowledge to exploit the semantics involved in performing complex information requests across multiple domains. The system’s framework provides a support for knowledge discovery. Knowledge representation and support for relationships, which are fundamental to the concept of Semantic Web are described in this chapter. Consider the capability provided by current research prototypes to support integration of information from diverse sources of data over a domain to provide the user with a unified structured (homogeneous) view of that domain for querying. On an integrated view of “earthquakes” one can ask queries of the nature “find information of all the quakes that occurred in California since 1990”. However, there is still a limitation on the type of queries that can be answered using such integrated domain views. Assuming views on earthquakes and nuclear tests did exist, can one answer the question “Do nuclear tests cause earthquakes?” How can one study such relationships between the two domains based on the data available on diverse web accessible sources? Let us consider a known relationship between air pollution and vegetation. Assuming the necessary views did exist can a question like “How does air-pollution affect vegetation” be answered using only the integrated views? These queries are beyond the realm of the existing systems. There is a