Demand-Driven Database Integration for Biomolecular Applications Karl Aberer GMD-IPSI, Dolivostr. 15, D-64293 Darmstadt, GERMANY email: aberer@darmstadt.gmd.de Introduction As a member of the consortium for the "Computation and Prediction of Receptor-Ligand Interaction" the Integrated Publication and Information Systems Institute, GMD-IPSI , Darmstadt, participates in the national joint project RELIWE. Docking-D is the part of RELIWE which considers heterogeneous database support and in which GMD-IPSI takes the leading role. In the current situation the receptor and ligand data used within the project, either raw data or data derived during analysis, is extremely heterogeneous. Many of these databases are supported by autonomous systems which employ different data management facilities with heterogeneous data models, in particular dedicated file systems with specialized retrieval and presentation functionality (e.g. PDB [1]) or a relational model (e.g. Whatif [20]). In addition, the information is represented at different levels of detail (e.g. sequence vs. structural data), with mutual inconsistencies in structure, naming, scaling, and behavior, whereby much of this behavior is hidden in the implementation of the autonomous systems. Thus the database system must enable integrated access to the underlying, autonomous, heterogeneous information bases, but also has to allow the integration of new datatypes (e.g. sequence and spatial data) and has to support associative retrieval of the data. Different tools, like receptor-ligand docking algorithms, model building tools for receptors or visualization tools, which are developed or provided by the other partners within the project (e.g. Whatif, LUDI [2]), must be connected to the DBMS. Database Integration There exist several approaches and projects which address interoperability or integration of information bases. For an extensive discussion of related work see [3][11][12][20], which give good overviews and present fundamental concepts including the terminology of the different approaches, e.g., multidatabase systems, multidatabase languages, and federated database systems. GMD-IPSI takes what is called the federated database approach. The tools and techniques developed for semantic integration assist incremental integration driven by actual information requests of end users and the dynamic maintenance of integrated schemas driven by external schema evolution. This approach tries to meet the requirements of realistic situations with a big number of external information bases. For example, currently there are at least 100 databases known providing biomolecular information. Due to their autonomy they are subject to schema change, which can not be controlled globally. Therefore completely integrated views valid for all users can hardly be achieved with reasonable effort. The complexity of the macromolecular CIF dictionary definition [18], which is aimed at establishing a universal schema for molecular biology, is a vivid illustration for that.