Automatic Detection of Structural Changes in Data Warehouses Johann Eder * and Christian Koncilia * and Dieter Mitsche § University of Klagenfurt Dep. of Informatics-Systems * {eder,koncilia}@isys.uni-klu.ac.at § dmitsche@edu.uni-klu.ac.at Abstract. Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data are a frequent threat to the correctness of OLAP results, in particular for multi-period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries. 1 Introduction and Motivation A data warehouse is a collection of data stemming from different frequently heterogeneous data sources and is optimized for complex data analysis opera- tion rather than for transaction processing. The most popular architectures are multidimensional data warehouses (data cubes) where facts (transaction data) are ”indexed” by several orthogonal dimensions representing a hierarchical or- ganization of master data. OLAP (on-line analytical processing) tools allow the analysis of this data, in particular by aggregating data along the dimensions with different consolidation functions. Although data warehouses are typically deployed to analyse data from a longer time period than transactional databases, they are not well prepared for changes in the structure of the dimension data. This surprising observation originates in the (implicit) assumption that the dimensions of data warehouses ought to be orthogonal, which, in the case of the dimension time means that all other dimensions ought to be time-invariant. When analysts place their queries they have to know which dimension data changed. Consider the following example: Diagnoses for patients were repre- sented in a data warehouse using the “International Statistical Classification of Diseases and Related Health Problems” (ICD) code. However, codes for diag- noses changed from ICD Version 9 to ICD Version 10. For instance the code for “malignant neoplasm of stomach” has changed from 151 in ICD-9 to C16 in ICD- 10. Other diagnoses were regrouped, e.g. “transient cerebral ischaemic attacks” has moved from “Diseases of the circulatory system” to “Diseases of the nervous © Springer Verlag 2003, http://www.springer.de/comp/lncs/index.html Eder J., Koncilia C. & Mitsche D. (2003). Automatic Detection of Structural Changes in Data Warehouses. Proceedings of the 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2003), September 3-5, Prague, Czech Republic, LNCS 2737, pp. 119-128.