Automatic Detection of Structural Changes in
Data Warehouses
Johann Eder
*
and Christian Koncilia
*
and Dieter Mitsche
§
University of Klagenfurt
Dep. of Informatics-Systems
*
{eder,koncilia}@isys.uni-klu.ac.at
§
dmitsche@edu.uni-klu.ac.at
Abstract. Data Warehouses provide sophisticated tools for analyzing
complex data online, in particular by aggregating data along dimensions
spanned by master data. Changes to these master data are a frequent
threat to the correctness of OLAP results, in particular for multi-period
data analysis, trend calculations, etc. As dimension data might change
in underlying data sources without notifying the data warehouse we are
exploring the application of data mining techniques for detecting such
changes and contribute to avoiding incorrect results of OLAP queries.
1 Introduction and Motivation
A data warehouse is a collection of data stemming from different frequently
heterogeneous data sources and is optimized for complex data analysis opera-
tion rather than for transaction processing. The most popular architectures are
multidimensional data warehouses (data cubes) where facts (transaction data)
are ”indexed” by several orthogonal dimensions representing a hierarchical or-
ganization of master data. OLAP (on-line analytical processing) tools allow the
analysis of this data, in particular by aggregating data along the dimensions
with different consolidation functions.
Although data warehouses are typically deployed to analyse data from a
longer time period than transactional databases, they are not well prepared
for changes in the structure of the dimension data. This surprising observation
originates in the (implicit) assumption that the dimensions of data warehouses
ought to be orthogonal, which, in the case of the dimension time means that all
other dimensions ought to be time-invariant.
When analysts place their queries they have to know which dimension data
changed. Consider the following example: Diagnoses for patients were repre-
sented in a data warehouse using the “International Statistical Classification of
Diseases and Related Health Problems” (ICD) code. However, codes for diag-
noses changed from ICD Version 9 to ICD Version 10. For instance the code for
“malignant neoplasm of stomach” has changed from 151 in ICD-9 to C16 in ICD-
10. Other diagnoses were regrouped, e.g. “transient cerebral ischaemic attacks”
has moved from “Diseases of the circulatory system” to “Diseases of the nervous
© Springer Verlag 2003, http://www.springer.de/comp/lncs/index.html
Eder J., Koncilia C. & Mitsche D. (2003). Automatic Detection of Structural Changes in Data Warehouses. Proceedings of
the 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2003), September 3-5, Prague,
Czech Republic, LNCS 2737, pp. 119-128.