Bridging Semantic Gap Pronab Ganguly*, Fethi A. Rabhi and Pradeep K. Ray School of Information Systems, Technology and Management The University of New South Wales, Kensington *Tel: 61 2 9691 5762; Fax: 61 2 9691 9574 E-mail: pganguly@qantas.com.au 1. Intent Software interoperation by semantics ensures that the requestor and the provider have a common understanding of the requested information. An example of semantic heterogeneity is the use of synonyms, such as employees or staff, which are used to refer to the same concept in different information systems. This type of software interoperation includes the semantics of the user queries and of information sources. The bridging gap pattern intends to bridge the semantic gap between the requestor and the provider . 2. Context Semantic incompatibility often occurs when old data or procedures are used for new purposes not anticipated by their original developers or among new systems that are the product of independent development efforts. In both the cases, this is because the semantics and procedures and data are not explicit. Requesters cannot determine whether providers match their assumptions. The results of such mismatch can be catastrophic – wrong results, sometimes with hidden or delayed indication that they are wrong. For example, in the late 1980’s, the Regan administration in USA began including military personnel in the base figure for calculating “unemployment”. When those figures were combined with earlier figures that did not include the military, the “unemployment” rate appeared to drop by 0.3%. Another example is Ariane 5, where an exception occurred while converting one type of number into another type in the upgraded version of software where code was reused from previous version. 3. Problem XML provides a common syntax to exchange heterogeneous information. Usually a Document Type Definition (DTD) or an Extensible Markup Language (XML) Schema is used as a standard mechanism to exchange information. But these schema-level specifications cannot resolve the issues related to semantic heterogeneity due to following reasons: Copyright (c) 2002, Australian Computer Society, Inc. This paper appeared at the Third Asian Pacific Conference on Pattern Languages of Programs (KoalaPLoP 2002), Melbourne, Australia.Conferences in Research and Practice in Information Technology,Vol. 13. James Noble and Paul Taylor, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included. There are many such schema-level specifications and they do not use the consistent set of terminology Data, captured in different files, having the same set of labels are not true representative of the consistent terminology. For a small number of systems, programs can be developed to translate terminologies between systems but scalability cannot be achieved. 4. Forces The major forces involved are: Meta-level data capture the richness of meanings conveyed by the data and normally human intelligence resolves any differences associated with the meaning.. To develop a computer readable meta-data is hard as the semantics of a term varies from context to context such as one information source may refer the term “apple” as a type of computer while another information source may refer the same term “apple” a type of fruit. Meta-data contains highest-level user/ business information requirement. You develop ER diagrams and database schema from the metadata. Schemas are implementation platform specific. Constrains at this level are implementation specific. Implementation specific schemas may not explicitly represent the constraints that are present in the meta-data level. Thus each meta-data may have different implementation specific schemas. Usually XML a Document Type Definition (DTD) or an Extensible Markup Language (XML) Schema is used as a standard mechanism to exchange information. It does not provide dynamic mapping and transformation between terms in a given context. Scalability – for a small system, programs can be developed to translate the terminologies, but that is not possible for a large system. 5. Solution To address the above forces, draw on a formal Ontology – a shared model of the domain – for the vocabulary and formalism of the computational specifications. The Ontology helps to structure our concepts for effective computing. Ontology abstracts reality in order to understand and process it. This model is computer