Hierarchy integration in the design of data warehouses
Yasser Hachaichi
Mir@cl Loboratory
ISAAS, B.P.1013
3018 Sfax, Tunisia
+216 97214875
yasser.hachaichi@fsegs.rnu.tn
Jamel Feki
Mir@cl Loboratory
FSEGS, B.P.1088
3018 Sfax, Tunisia
+216 22708783
jamel.feki@fsegs.rnu.tn
ABSTRACT
Face to their merger and/or collaboration with partners, today’s
enterprises often need to integrate several databases. As a result,
their decision making process ends up analyzing data coming
from various databases. While database integration has been
thoroughly examined, it is only recently that the integration of
multidimensional models has drawn attention. A
multidimensional model is a data model that facilitates the
analysis operations during the decision making process. It
organizes data into facts that can be analyzed according to
dimensions represented through hierarchies. This paper presents
an integration process for the hierarchy concept. In particular, it
proposes a set of basic integration operations and constraints that
produce a multidimensional model that is loadable from different
data sources. It illustrates the integration process through
examples.
Keywords
Data Warehouse, Hierarchies Integration, Multidimensional
Schemas.
1. INTRODUCTION
Information systems of organizations are generally divided into
two classes: Operational systems and decision support systems.
Operational systems serve the need of running the daily activities
of the business, whereas decision support systems provide
historical information to trace and analyze the business in order to
make judicious business decisions [1]. In fact, many companies
have realized the importance of the information treasure hidden
within their operational systems, which can significantly improve
their decision quality [2].
Data warehousing is a technology that: collects all relevant data
into one central system; organizes the data efficiently so it is
consistent and easy to retrieve; and keeps “old” data for historical
analyses [3]. William Inmon, who coined the term “data
warehouse”, defined a data warehouse as “a subject oriented,
integrated, nonvolatile, and time variant collection of data in
support of management decision” [4]. Nowadays, many data
warehouses are developed using the dimensional modeling
approach; this model can be composed of a set of coherent data
marts (DM) [5] each of which provides a dimensional view of a
single business process.
Since the DW and/or the DM are loaded with the data generated
by the operation support system, several researchers proposed
dimensional modeling approaches for the most commonly used
data source models [6], [7], [8], [9], [10], [11], [12], [13], [14],
[15]. In our previous works, we have defined three automatic
bottom-up/data-driven methods for DM schemas design. These
methods start respectively from relational database [9], XML
documents data-centric [14] and object oriented databases [15].
Our methods exploit recent versions of a data source in the
operation support systems and automatically apply a set of rules
that extract all multidimensional concepts (facts with their
measures, dimensions and hierarchies) and then produce DM
schemas dedicated to business decision.
Overall, similar to our approach, current propositions treat each
data model separately. However, due to economic constraints, an
enterprise is often forced to open its operational system to
collaborate with partners and/or to ensure the collaboration among
its affiliates whose data models might be different. Hence, one
needs a way to integrate the DMs produced by the approach
appropriate to the data source model. Instead of integrating data
sources with different structures, we believe that the integration of
DMs is easier since DMs are structured in a uniform and simple
way, along the widely accepted notions of dimension and fact
[16].
The main objective of this paper is to propose a process for DM
integration in order to design DM schemas loadable from different
sources. As a second objective, the proposed process can also be
used to build multidimensional patterns [17] that are generic DM
schemas used to assist a decision maker to specify his/her
analytical needs. In addition, the integration of multidimensional
data (i.e., DMs) is a reasonable solution for the problem of
building a complete DM gathering data from several existing
DMs.
The problem of integration has been studied in the literature
extensively for heterogeneous databases [18], [19], [20], [21],
[22]. This problem presents, in general many facets and, a survey
of the solutions proposed are studied in [23], [22]. In this paper,
we take apart the general aspects of the problem, and we do not
address the general issues of the integration problem, such as the
automatic matching of terms [24]; rather, we focus our attention
on the structural integration of hierarchies. Hierarchies represent
a main component in a DM schema as they are crucial for
DrillDown and RollUp OLAP (On-Line Analytical Processing)
operations. Indeed, the hierarchies structure distinguish
multidimensional schemas from databases and their integration
raises specific issues not yet addressed by the data warehousing
community.
This paper is organized as follows. Section 2 overviews the main
concepts of multidimensional models. Section 3 presents the
metrics, operations and constraints we use to define an integration
process in Section 4. Finally, the paper is summarized and its
future work is outlined in Section 5.
Annual International Academic Conference on Business Intelligence and Data Warehousing (BIDW 2010).
Copyright © GSTF 2010.
ISBN: 978-981-08-6308-1.
doi:10.5176/978-981-08-6308-1 66
72