Constructing OLAP Cubes Based on Queries Tapio Niemi Jyrki Nummenmaa Peter Than&h Department of Computer and Department of Computer and Department of Computer Science, information Sciences Information Sciences University of Edinburgh FIN-33014 University of Tampere FIN-33014 University of Tampere Edinburgh, EH9 3JZ Finland Finland Scotland zyxwvutsrqponm +358 32156595 +358405277999 +44 7968401525 tapio@cs.uta.fi jyrki@cs.uta.fi pt@dcs.ed.ac.uk ABSTRACT An On-Line Analytical Processing (OLAP) user often follows a train of thought, posing a sequence of related queries against the data warehouse. Although their details are not known in advance, the general form of those queries is apparent beforehand. Thus, the user can outline the relevant portion of the data posing generalised queries against a cube representing the data warehouse. Since existing OLAP design methods are not suitable for non- professionals, we present a technique that automates cube design given the data warehouse, functional dependency information, and sample OLAP queries expressed in the general form. The method constructs complete but minimal cubes with low risks related to sparsity and incorrect aggregations. After the user has given queries, the system will suggest a cube design. The user can accept it or improve it by giving more queries. The method is also suitable for improving existing cubes using respective real MDX queries. Categories and Subject Descriptors H.2.1 [Database Management]: Logical Design - data models, normal forms, schema and subschema General Terms Management, Design Keywords Logical OLAP design, MDX queries, data warehousing. 1. INTRODUCTION On-Line Analytical Processing (OLAP) is a method to support decision making in situations where raw data on measures such as sales or profit needs to be analysed at different levels of statistical aggregation. In OLAP, queries are made against multidimensional cubes, called OLAP cubes. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies arc not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise. or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. /IO/.,4P ‘01. November 9,200 I, Atlanta, Georgia, USA. Copyright 2001 ACM l-581 l3-437-1/01/0011...$5.00. The design of a cube is based on knowledge of the application area and the types of queries the users are expected to pose. Since constructing an OLAP cube can be a difficult task for the end user, it is often seen as the duty of the data warehouse administrator. This has led to researchers and vendors regarding the OLAP cube as a static storage structure for data warehouse data. This is problematic since the user often wants to make new kinds of queries, which may also need new OLAP cubes. The same cube is not always practical for different analysis tasks, since the structure of the cube has a notable effect on efficiency and ease of posing queries. Moreover, even professionals may have difficulties in cube design since generally accepted logical design methods do not exist so far. In general, the structure of the resulting OLAP schema should be constructed by taking into account the information about the dependencies among the concepts and other meta knowledge on the concepts and the application area, and the query requirements. We believe that the user should be able to design customised cubes which are the best suitable for his/her current analysing tasks. This is important since the analysis tasks or the data in the data warehouse can change almost daily and the cube design, of course, should follow these changes. Moreover, some users may also utilise planning methods in which a cube is designed for a hypothetical situation, for example to analyse a new organisation structure beforehand. It is also worth mentioning that business intelligence people work in multi-user client/server environments where there can be major advantages to having the cube on the desktop machine rather than on an occasionally slow server. In our approach, we assume that a new OLAP cube is constructed for new analysis purposes. For example, one user might have detected some interesting facts in a subcube of an existing cube. The user might then pass on this information to a specialist who needs to examine the data located in the subcube in greater detail. We also assume that actual data warehouse data is stored independently and that it is available for populating OLAP cubes. Our aim is to reduce the level of technical knowledge that a user needs in order to construct an OLAP cube. We combine the cube design and query construction. There is a natural connection between OLAP cubes and queries, since it is meaningful to think that an OLAP query returns an OLAP cube. An ideal cube contains all information and only the information that is relevant to the user’s queries. In addition, the structure of the cube should be such that, for example, aggregations are not