Component-Driven Engineering of Database Applications Klaus-Dieter Schewe 1 Bernhard Thalheim 2 1 Massey University, Department of Information Systems & Information Science Research Centre Private Bag 11 222, Palmerston North, New Zealand, email: k.d.schewe@massey.ac.nz 2 Christian Albrechts University Kiel, Department of Computer Science and Applied Mathematics Olshausenstr. 40, 24098 Kiel, Germany, email: thalheim@is.informatik.uni-kiel.de Abstract Though it is commonly agreed that the design of large database schemata requires group effort, database de- sign from component subschemata has not been in- vestigated thoroughly. In this paper we investigate snowflake-like subschemata of database schemata expressed in the Higher-order Entity-Relationship Model (HERM). These subschemata are almost hi- erarchical in the sense that they may contain cycles in the schema, but not in the instances. We show that each HERM schema can be decomposed into such subschemata using a small set of composition constructors. We then describe how the composition of components can be seen as a database design primi- tive leading to component-driven database design and re-design pragmatics. 1 Introduction While design and manufacturing from components is standard in civil, electrical and mechanical engineer- ing, it is still in an embryonal state in software en- gineering (Arsanjani 2002). In general, omponent- based engineering means the decomposition of a task, the isolated realisation of the tasks each resulting in a component of the complete system, the composition or “assembly” of the components based on standard- ised principles. In program design the design from components has made some progress (Barroca, Hall & Hall 2000, Crnkovic, Hnich, Jonson & Kiziltan 2002) based on clear input/output interfaces. A similar approach has been followed in the emerging area of web ser- vices (Hamadi & Benatallah 2003). For database applications, however, design from component sub- schemata has not been investigated thoroughly. The few existing approaches such as (Akoka & Comyn- Wattiau 1994, Bancilhon & Spyratos 1981, Hay 1995, Jaeschke, Oberweis & Stucky 1994, Rauh & Stickel 1992, Teorey, Wei, Bolton & Koenig 1989) concen- trate mainly on the integration of schemata, whereas according to (Thalheim 2000a) the design of database application has to consider also interfaces and dy- namic behaviour. Thus, the problem we face in component-based engineering of database applica- tions is deeper, as we have to take care of complex structures, constraints, views and operations. In this paper we develop an approach to this problem extending and formalising previous work in (Thalheim 2002, Thalheim 2005). We start with a Copyright c 2006, Australian Computer Society, Inc. This pa- per appeared at the Third Asia-Pacific Conference on Concep- tual Modelling (APCCM2006), University of Tasmania, Ho- bart, Australia. Conferences in Research and Practice in Infor- mation Technology, Vol. 53. Markus Stumptner, Sven Hart- mann, and Yasushi Kiyoki, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included. rather informal discussion in Section 2 on the ra- tionale behind the desire to develop database appli- cations from components. In particular, we discuss problems arising with large schemata, schema pat- terns observed in practical applications, and the spec- trum of different understandings of the term “compo- nent”. Then we investigate components in the higher- oreder Entity-Relationship model (HERM) (Thalheim 2000a), because the ER-approach is widely used in practice and easy to use, while HERM does not share the deficiencies of some ER-variants such as lack of formal foundations, constraint theory, retrieval and update languages, etc. We present a brief overview of HERM in Section 3 as much as this is necessary for our purposes. We are confident that our approach can be generalised to data models with cyclic references, e.g. sophisticated object models (Schewe & Thalheim 1993) or XML (Abiteboul, Buneman & Suciu 2000). From various application projects we observe that HERM schemata tend to have larger parts that have the form of star and snowflake schemata, i.e. rather simple schemata centered around a central database type. Such schemata are well known from the area of data warehousing and on-line analytical processing (OLAP) systems. In particular, these subschemata are (almost) hierarchical and correspond to certain tasks within the application. Therefore, we take such subschemata in a generalised form as the basis for components. In particular, we do not request that cycles are completely absent, but that cycles may oc- cur in the schema, but not in the instance, which can be expressed by simple path constraints. This is similar to γ -acyclicity in databases (Hegner 1988). Furthermore, we extend these subschemata with the necessary “plugs” that are used to amalgamate them in a way that behaviour defined for a component car- ries over to behaviour on the amalgam. We develop the formal basics of this theory of components in Sec- tion 4. In particular, the “plugs” will be formalised by (updatable) views and operations on these views. On this basis we develop a composition theory in Section 5, which basically consists of a collec- tion of composition operations. These generalise in- put/output behaviour for program modules. We then show that each HERM schemata is in fact the com- position of its maximal snowflake components. This decomposition theorem is central, as it shows that design from snowflake components can always be achieved. However, we need not only such a theoretical statement, but also pragmatic guidelines for compnent-driven design, which will consist of pragmatics of setting up (not necessarily maximal) snowflake components as in (Feyer & Thalheim 2002), the process of amalgamation, and the assessment of the resulting interface (Vestenicky, Lewerenz & Feyer 2000). In particular, we prefer components with