Hierarchy integration in the design of data warehouses Yasser Hachaichi Mir@cl Loboratory ISAAS, B.P.1013 3018 Sfax, Tunisia +216 97214875 yasser.hachaichi@fsegs.rnu.tn Jamel Feki Mir@cl Loboratory FSEGS, B.P.1088 3018 Sfax, Tunisia +216 22708783 jamel.feki@fsegs.rnu.tn ABSTRACT Face to their merger and/or collaboration with partners, today’s enterprises often need to integrate several databases. As a result, their decision making process ends up analyzing data coming from various databases. While database integration has been thoroughly examined, it is only recently that the integration of multidimensional models has drawn attention. A multidimensional model is a data model that facilitates the analysis operations during the decision making process. It organizes data into facts that can be analyzed according to dimensions represented through hierarchies. This paper presents an integration process for the hierarchy concept. In particular, it proposes a set of basic integration operations and constraints that produce a multidimensional model that is loadable from different data sources. It illustrates the integration process through examples. Keywords Data Warehouse, Hierarchies Integration, Multidimensional Schemas. 1. INTRODUCTION Information systems of organizations are generally divided into two classes: Operational systems and decision support systems. Operational systems serve the need of running the daily activities of the business, whereas decision support systems provide historical information to trace and analyze the business in order to make judicious business decisions [1]. In fact, many companies have realized the importance of the information treasure hidden within their operational systems, which can significantly improve their decision quality [2]. Data warehousing is a technology that: collects all relevant data into one central system; organizes the data efficiently so it is consistent and easy to retrieve; and keeps “old” data for historical analyses [3]. William Inmon, who coined the term “data warehouse”, defined a data warehouse as “a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management decision” [4]. Nowadays, many data warehouses are developed using the dimensional modeling approach; this model can be composed of a set of coherent data marts (DM) [5] each of which provides a dimensional view of a single business process. Since the DW and/or the DM are loaded with the data generated by the operation support system, several researchers proposed dimensional modeling approaches for the most commonly used data source models [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. In our previous works, we have defined three automatic bottom-up/data-driven methods for DM schemas design. These methods start respectively from relational database [9], XML documents data-centric [14] and object oriented databases [15]. Our methods exploit recent versions of a data source in the operation support systems and automatically apply a set of rules that extract all multidimensional concepts (facts with their measures, dimensions and hierarchies) and then produce DM schemas dedicated to business decision. Overall, similar to our approach, current propositions treat each data model separately. However, due to economic constraints, an enterprise is often forced to open its operational system to collaborate with partners and/or to ensure the collaboration among its affiliates whose data models might be different. Hence, one needs a way to integrate the DMs produced by the approach appropriate to the data source model. Instead of integrating data sources with different structures, we believe that the integration of DMs is easier since DMs are structured in a uniform and simple way, along the widely accepted notions of dimension and fact [16]. The main objective of this paper is to propose a process for DM integration in order to design DM schemas loadable from different sources. As a second objective, the proposed process can also be used to build multidimensional patterns [17] that are generic DM schemas used to assist a decision maker to specify his/her analytical needs. In addition, the integration of multidimensional data (i.e., DMs) is a reasonable solution for the problem of building a complete DM gathering data from several existing DMs. The problem of integration has been studied in the literature extensively for heterogeneous databases [18], [19], [20], [21], [22]. This problem presents, in general many facets and, a survey of the solutions proposed are studied in [23], [22]. In this paper, we take apart the general aspects of the problem, and we do not address the general issues of the integration problem, such as the automatic matching of terms [24]; rather, we focus our attention on the structural integration of hierarchies. Hierarchies represent a main component in a DM schema as they are crucial for DrillDown and RollUp OLAP (On-Line Analytical Processing) operations. Indeed, the hierarchies structure distinguish multidimensional schemas from databases and their integration raises specific issues not yet addressed by the data warehousing community. This paper is organized as follows. Section 2 overviews the main concepts of multidimensional models. Section 3 presents the metrics, operations and constraints we use to define an integration process in Section 4. Finally, the paper is summarized and its future work is outlined in Section 5. Annual International Academic Conference on Business Intelligence and Data Warehousing (BIDW 2010). Copyright © GSTF 2010. ISBN: 978-981-08-6308-1. doi:10.5176/978-981-08-6308-1 66 72