IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 14, NO. 9. SEPTEMBER 1988 1293 An Automatic Physical Designer for Network Model Databases PASQUALE RULLO AND DOMENICO SACCA Abstracr-System EROS is a physical design tool for CODASYL da- tabase systems which covers a large spectrum of decision variables, notably, location mode, set implementation, set order, and search keys. System EROS is based on a model where the CODASYL physical da- tabase design problem is formulated as an extension of the index selec- tion problem in the relational database environment. In particular, optimization algorithms for index selection are extended to solve the more complex problem of selecting a good physical access path config- uration for CODASYL databases. Therefore, the proposed approach represents a unified solution to the physical database design problem for both CODASYL and relational systems. Index Terms-Access path selection, CODASYL and relational da- tabase systems, cost evaluation, heuristic optimization, physical data- base design. I. INTRODUCTION T HE database design process is concerned with the def- inition of an efficient organization of data represent- ing the relevant facts of the system that is to be auto- mated. The achievement of such a goal is a very complex task, since a great number of database structures can be obtained starting from the same set of requirements. For this reason, a number of manual design methodologies and automated tools have been developed to support the de- sign process [3], [4]. This process is, generally, parti- tioned into four steps: 1) requirements analysis, 2) con- ceptual design, 3) implementation (or logical) design and 4) physical design. While the first three steps mainly deal with semantic aspectsof databasedesign, the last one has the major impact on database performance. In fact, the major theme of physical databasedesign is the selection of a physical accesspath configuration that can efficiently support navigations on the logical schema. Most difficul- ties of this design problem arise from the fact that logical objects may be accessed in many different ways by dif- ferent transactions. Selection of a good combination of accesspaths, from the variety of accessmechanisms pro- vided by the databasesystem, is known to be a complex and time-consuming problem. For this reason, physical design techniques are, in general, based on heuristic ap- proaches and they are very much dependent on the struc- ture of the underlying databasesystem. Manuscript received November 19, 1986; revised March 6, 1987. P. Rullo is with CRAI, 87036 Rende-Santo Stefano, Italy. D. Sac& is with the Dipartimento di Sistemi, Universita della Calabria, Arcavacata, 87036 Rende, Italy. IEEE Log Number 8822452. The physical design problem for relational database systems essentially coincides with the problem of select- ing an optimal set of indices, since they are the most widely used accesspaths. The index selection problem is solved by optimization models that make use of heuristic techniques [ 11, [2], [ 161, [ 171. (An example of a physical design tool, that utilizes heuristics for index selection, is DBDSGN [8], that has been developed for System R.) On the other hand, the physical design problem for CODA- SYL database systems is more complex since such sys- tems support a large variety of accesspaths. For this rea- son, most design techniques are based on evaluation models, which compare a reduced number of different de- sign alternatives, supplied by the database designer [7], [9], [lo], [ 121, [ 131, [ 181, [20]. Therefore, no optimiza- tion algorithm is needed. Nevertheless, such models are, generally, complex enough to precisely characterize the performance of every design alternative and to select the best design on the basis of a large spectrum of decision variables. Optimization models have been proposed as well, that do not require to predefine design alternatives. Unfortunately, they are characterized by a too narrow number of decision variables [ 111, [22]. Recently, Reuter and Kinzinger [14] have presented a software tool for physical design in CODASYL environment, called DAIS, that is based on an optimization model and covers many decision variables. In this paper, we present a CODASYL physical data- base design tool, called System EROS,’ which is based on an optimization model covering a large spectrum of physical decision variables, namely, location mode, set implementation, set order, and search keys. We point out that the above decision variables are all those taken into account in DAIS. However, both the cost evaluation and the selection process are different in the two systems. In Section VI, we shall compare the two systems in more detail. An interesting aspect of EROS is that the physical da- tabase design problem for CODASYL systems is formu- lated as an extension of the index selection problem in the relational environment. This formulation represents an unified approach to the physical databasedesign problem and can be also used for the novel relational databasesys- tems supporting other access paths beside indexes. (A ‘We decided not to call “PLATO” a physical design tool. 00985589/88/0900-1293$01 .OO 0 1988 IEEE