T. Niemi, J. Nummenmaa, P. Thanisch 7-1 Logical Multidimensional Database Design for Ragged and Unbalanced Aggregation Hierarchies Abstract Research on logical design of OLAP cubes has tended to assume that the rollup hierarchy in a cube dimension takes the form of a balanced tree. However, experience from industry indicates that a much broader class of rollup hierarchies is of interest in practice. For example, the hierarchy tree might be unbalanced or ragged (or both), or indeed the hierarchy might not be a tree at all, but a more general acyclic directed graph structure such as a lattice. We demonstrate how dependency information can assist in the design of aggregation hierarchies. In addition to dependencies familiar from relational database theory, we use new classes of dependencies to extend logical design principles so that they include this more general notion of cube dimension hierarchies. 1 Introduction Hierarchical dimensions are an essential part of On-Line Analytical Processing (OLAP). The hierarchical structure allows the user to study facts in different levels of details. Good hierarchical structures ensure correct and efficient calculation of aggregation functions: consistent and coherent structure of hierarchies decreases logical errors while efficiency of calculation is increased, for example, by decreasing redundancy in rollup paths. Logical modelling methods of these hierarchies differ from the methods used in database design. Thus, dependencies used in database design are not enough for modelling OLAP hierarchies. In this work our aim is to study which kinds of dependencies would be needed to get aggrega- tion hierarchies more desirable for OLAP. Much of the research on logical design for OLAP cubes has concentrated on the class of aggregation hierarchies that arise from star and snowflake schemata. In such aggregation hierarchies: (1) the attribute labelling a column in a dimension table will correspond to a level in the aggregation hierarchy of the corresponding cube dimension, and (2) data values occurring in that column of the dimension table will become the members of the corresponding level. OLAP practitioners need a much more general notion of aggregation hierarchies than this. For example, an aggregation hierarchy in one of the cube’s dimensions might be built from a hierarchical relationship that is modelled in one of the database tables. Example 1 illustrates this. Example 1 Consider a management hierarchy in an Employee table in which each row contains the information pertaining to a particular employee and one column in the table contains the employee number (attribute EmpID) and another contains the number of that employee’s manager (attribute MgrID). In the OLAP cube, we have Salary as the measure and an Employee dimension. The aggregation hierarchy in the Employee dimension is the management hierarchy modelled by the EmpID and MgrID columns. This aggregation hierarchy allows us to query the total salary bill for all employees below any given manager. We note that in Example 1, we do not have levels corresponding to attributes. Rather, levels correspond to data values occurring in the table. Thus two quite Tapio Niemi Department of Computer and Information Sciences, University of Tampere FIN-33014 University of Tampere, Finland tapio@cs.uta.fi Jyrki Nummenmaa Department of Computer and Information Sciences, University of Tampere FIN-33014 University of Tampere, Finland jyrki@cs.uta.fi Peter Thanisch Department of Computer Science, University of Edinburgh Edinburgh, EH9 3JZ, Scotland pt@dcs.ed.ac.uk The copyright of this paper belongs to the paper’s authors. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’2001) Interlaken, Switzerland, June 4, 2001 (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.) http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/