Algebraic Graph-Based Approach to Management of Multibase Systems, I: Schema integration via sketches and equations In Next Generation of Information Technologies and Systems, NGITS’95, Proc. 2nd Int.Workshop, Naharia, Israel, June 1995. A. Motro and M. Tennenholtz (Eds.), 1995, pp.69-79 Boris Cadish and Zinovy Diskin ∗ Frame Inform Systems, Elizabetes Str. 23, Riga, LV-1234, Latvia diskin@frame.riga.lv 1 Introduction and motivating discussion Now it is evident that a chief property of the next generation information systems is their organization on cooperative principles. In its turn, for cooperative systems (CIS) it is normal that an application pro- gram needs data stored in several separate databases (DBs). In fact, as is was noted in [Mot87], such a multibase situation is very similar to the multifile sit- uation before invention of DBs, and a natural solution is analogous: in order to provide applications with a single integrated view of data, local DBs should be integrated into a distributed DB. The later is a database system which has a schema as an ordinary DB but its extension is virtual: it is not stored but can be computed if requested. In addition, a peculiarity of CIS consists in heterogeneity of local DB systems caused by orientation on different level applications, and/or by the autonomy of their origin and initial de- velopment. So, heterogeneous multibase integration appears as a fundamental issue of CIS functioning. In particular, in the context of federated database systems (FDBS) , integration is a function regularly performed at different levels and by different services depending on the organization of the FDBS environ- ment. Significance of the problem is well known, var- ious approaches, techniques and sometimes tools were proposed (see, eg, [DH84, Mot87, WHW90, YAD + 92, SPD92b, SPD92a, SST92] and surveys [BLN86, SL90]). In spite of the diversity of ap- proaches, several common points can be well iden- tified. Integration consists of schema integration – com- posing a global schema from the set of local ones, and data integration – computing virtual extension ∗ Supported by Grant 94.315 from the Latvian Council of Science LFS1 ... LFS k Federated Users Views ❄ GS (Global Schema) ❄ Schema integration LS1 ✻ ... LSn ✻ AS ✻ Translation into a Common Data Model LHS1 ✻ ... LHSn ✻ Figure 1: Schema integration architecture of the global schema. In practice this means setting a collection of procedures which convert any query against the global schema into a set of queries against local schemas and then compute the global query an- swer by summarizing the local answers. Schema in- tegration is the key issue: its correctness determines the correctness and effectiveness of the second phase. The typical general architecture was described al- ready in [DH84] and remains in essence the same (cf. [SL90, YAD + 92]) as presented on Fig.1 Local host schemas (LHS’s) can be specified in different DDLs. An auxiliary DB with the schema AS may be needed to record information required for integration. The global schema (GS) is defined as a superview of LSs and AS. Its definition presupposes resolving various conflicts between local databases so that GS provides users with the illusion of a homogeneous and inte- grated DB. In addition, different (federated) views against GS may be defined for different applications. In some approaches (eg, [DH84, Mot87] and oth- ers), schema integration is performed by consecutive applying structural operations from a certain prede- fined collection to the component schemas. Syntacti- cally these operations are usually specified by a spe- cial non-procedural view definition language. 1