1 Bogdan CZEJDO, Kenneth MESSA, Tadeusz MORZY, Mikolaj MORZY, Janusz CZEJDO DATA W AREHOUSES WITH DYNAMICALLY CHANGING SCHEMAS AND DATA SOURCES Summary: Research in the data warehousing area focuses on design issues, data maintenance and query optimization. Recently new research areas appeared that are related to dynamicity of data sources. Dynamicity of data sources can be categorized into: data updates , schema and instance changes, and constraint modifications. Existing data warehouse systems manage data updates. However, they are unable to follow schema and instance changes and constraint modifications. In this paper we analyze schema and instance changes caused by dynamically changing external data sources. We advocate the need to apply the external data source schema changes to a data warehouse and we present modeling issues involving star schema evolution and data warehouse versioning. Finally, we show query processing in presence of different data warehouse versions. Key words : temporal data warehouses, advanced OLAP queries, schema change in data warehouse 1. INTRODUCTION Integration of different, autonomous and heterogeneous external data sources (EDS) is crucial for today’s businesses. Two basic approaches to consolidate distributed EDSs and provide integrated information to users [1,2,3] are the query-driven approach and the data warehousing approach. In the query-driven approach EDSs are integrated only at the logical level by merging all local schemas into a single global logical schema (no integration of EDS contents takes place, all data is stored only locally inside the EDSs). User queries executed against the global schema are translated by mediators into one or more queries executed against local EDSs. The mediators join the answers from the EDSs and return the final answer to the user. This approach has several advantages. No central database is required to physically integrate data from external data sources. There are no extract-transform-load processes to move data from EDSs to centralized data repository. There is no latency in data, all data is up-to-date. The data warehousing approach is based on the centralized data repository. Data is extracted from EDSs, transformed (i.e. filtered, cleansed, enriched), and loaded into a centralized data repository called a data warehouse. As opposed to the query-driven approach, the data warehouse integrates at the global level both schemas and data. Integrated global schema consists of a collection of tables/views defined over export schemas of EDSs. Queries submitted to the data warehouse are executed locally, without accessing original EDSs, which considerably increases the query performance. It improves the availability of data and protects the data warehouse from the network delays or even the inaccessibility of external data. Local processing at EDSs is not affected by global applications running in the data warehouse. The data warehouse provides users with additional information such as aggregates, summaries or historical data. These are the main reasons why the data warehousing approach became such popular technology for numerous enterprises requiring high query performance and high data availability [4,5,6]. Until now research in data warehousing concentrated mainly on design issues, query performance and optimization, data maintenance, data refresh strategies and implementation issues. New file organizations have been proposed along with new access methods and new index structures (e.g. bitmap indexes). Most data warehouse models assumed, that data sources and data warehouse schema are static and that only the data changes. However, this