1 Capturing Semantics from Bitmap Indices for Data Analysis Carlo DELL’AQUILA, Ezio LEFONS, and Filippo TANGORRA Dipartimento di Informatica Università di Bari via Orabona 4, 70125 Bari ITALY {dellaquila, lefons, tangorra}@di.uniba.it Abstract: - Bitmap indexing is a diffuse approach for processing efficiently complex queries in decision support activities. Besides this common use of bitmap structures, the use of bitmap indices to represent analytical views of user’s data is presented here. In this approach, bitmaps can be created and utilized not only to index different domain attribute values, but also to pre-compute legal relational algebra query expressions useful for the analytic purposes. According to this approach, problems of data integrations and conceptual correlations of analytical data can be efficiently solved. Key Words: - Bitmap index, Data semantics, Data semantic integration, Analytical data view, Complex query pre-processing, Decisional user 1 Introduction The need to process and analyze huge volumes of information in OLAP or data-mining applications has led to the extraction, integration, and organization of enterprise data. The commonly adopted solution is to build data warehouses, very large repositories that integrate data from operational databases of several enterprise sectors for decisional analysis. Data warehouses contain data consolidated from several operational databases and they must solve two crucial problems, namely, data integration, and efficient access to stored data. The integration of data coming from different sources is necessary for such data are stored on heterogeneous, relational and legacy databases that are components of information systems owned by the decisional organization or other external organizations. The ETL procedures are the tools of the data warehouse system that allow the decisional data administrator to extract, to transform, to load, to refresh, and to integrate data from the several source data described by different data schemas in the data warehouse. The technological solutions of distributed services for information systems are adopted, such as, for example, the management of the data inconsistency and structure incompatibility. The definition of the ETL procedures is a very difficult step, because generally inadequate or no tools at all are furnished by the system supporting the data warehouse building up. Therefore, often, the user itself must implement manual procedures for this purpose analyzing source data and schemas in order to discovery inconsistent data, incompatible data structures, and data granularity. Accessing the data in warehouses in an efficient and, at the same time, flexible fashion is a difficult objective to obtain, because decision support data are collected on very large historical repositories. On one part, the efficiency of accesses takes advantage when using repetitive and predefined queries for they allow to predispose adequate internal structures, such as indices [2, 8, 10], materialized views [1, 5, 14], summary tables [12], and statistical data profiles [6, 7, 11]. On the other hand, the data analysis is an iterative and exploratory process in nature characterized by a sequence of unpredictable and occasional queries, in which each successive query against the data warehouse could be influenced by the results of previous ones. This process can vanish the predefined structures and require flexible data organization. Bitmap indexing is a very popular technique proposed for processing complex queries in the data warehouse environment supporting decisional and OLAP applications. In fact, this methodology is particularly adapt to provide fast query processing in a decisional context characterized by non-volatile and read-mostly data. The use of bitmap indices has been considered by researchers pre-eminently from the physical point of view. In this paper, we present the bitmap structure at an high abstraction level and show its capability to capture semantics to solve integration problems, efficiency in accessing large data amount, and flexibility in answering to decisional needs. Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 22-24, 2006 438