Knowledge based integration of heterogeneous databases P. Fankhauser and E.J. Neuhold Integrated Publication and Information Systems Institute (GMD–IPSI), P.O.Box 104326, 6100 Darmstadt Abstract We present an approach to integrate heterogeneous database schemas utilizing fuzzy real world knowledge. We model world knowledge by means of a fuzzy terminological network. On this basis we enrich classes semantically by determining the best tree spanning the class–name, its attributes, and its relationships in the network. The ambiguous edges of these trees are accumulated by means of fuzzy set intersection, and the trees can be used as a skeleton for further disambiguation. By unifying the spanning trees for two heterogeneous class definitions we can measure their degree of semantic resemblance. For resembling classes the unified tree(s) can then be used as a skeleton for proposing the most likely way(s) of their integration. Our approach specifically takes into account ambiguity arising from generic or polysemic names and from multiple possible roles of attributes and relationships. Furthermore, it allows for identification and integration of classes which maintain complementary rather than overlapping aspects of real world objects. Keyword Codes: H.2.5; H.2.1; I.2.4 Keywords: Heterogeneous Databases; Logical Design; Knowledge Representation Formalisms and Methods 1. INTRODUCTION The comparison and integration of heterogeneous schemas is one of the main steps towards automatically achieving semantic interoperability among heterogeneous databases [1]. Whereas this problem has been investigated to depth in view integration [2,3], integrated access to heterogeneous databases poses substantially new problems. Firstly, multiple heterogeneous databases are mostly not consulted because they maintain overlapping information, but because they provide complementary information, i.e. their classes do not have many attributes in common. This is not the case with views, which are typically designed for rather narrow domains and thus exhibit a great deal of redundancy between the different application needs. In fact, view integration mainly aims at reducing redundancy, and approaches to view comparison heavily rely on the detection of redundancy. Thus, these approaches can only partially be applied to the comparison and integration of heterogeneous databases. Secondly, ambiguity caused by generic