ENHANCING DATA PREPARATION PROCESSES USING TRIGGERS FOR ACTIVE DATAWAREHOUSING Farhi Marir Department of Computing, Communication Technology and Mathematics, London Metropolitan University, London, UK f.marir@londonmet.ac.uk Kanana Ezekiel Department of Computing, Communication Technology and Mathematics, London Metropolitan University, London, UK k.ezekiel@londonmet.ac.uk Abstract: Data preparation is a significant pre- processing task to prepare data for mining. The data mining process cannot succeed without a serious effort to prepare data. Very often mistakes are found in data, thus making the analysis process more difficult. Without the data preparation phase, we will have no idea whether the data quality can support analysis queries. Several techniques exist for data preparation in data warehousing. However, one of the problems of existing approaches is their limited support for data preparation for active and changing environments such as Active Data Warehouses. Their focus is on static data preparation approaches. This paper addresses this limitation and a trigger mechanism designed to manage changes in a dynamic environment is utilized. The specification language of a trigger supports active and dynamic capabilities that enable users to automatically filter or select and cleanse data at runtime. In additional the focal point of this work is not only on syntactic but also semantic data preparation approach. Index TermsTriggers, Data preparation, Data cleaning, Data Mining, Active Data Warehouse, Active Database System, Event Condition Action (ECA) Rules I. INTRODUCTION ver the years, people viewed their data warehouses as static as the data did not change very often. Then it evolved, as the static data sets could not give the most current and recent changes necessary. The data warehouse environment was set up to give static snapshots of data at some point in time, perhaps from as recently as the last week. However last week's data or even last night's data is often not sufficient to react to current situations. Things change rapidly in today's electronic-business economy and the company with the best set of integrated, current data is the one that will survive. Organizations today are spending a lot of money on acquiring a comprehensive integrated set of accurate data from their data warehouses to enhance performance and outstrip the competition. Accommodating a customer within minutes of an event represents the behavior of an active database system. Active Data Warehouse (ADW) applies the idea of Event-Condition-Action rules (ECA rules) from active database systems to implement active behavior [1]. In most active database systems, triggers also known as Event-Condition-Action rules are used to monitor changes and enforce integrity constraints. For active data warehouse environments, the ECA rules are required for checking the data and automating routine analysis decisions. As data warehouse environments adapt to new active advancements, a significant need exists for new techniques with abilities to automatically help users to prepare data for mining in active and evolving environments. Data preparation is a significant pre-processing task that is carried out as part of data mining and before analysis phase. The objective of data preparation phase is to cleanse and transform data into a format suitable for the application and analysis phase. The analysis process cannot succeed without a serious effort to prepare data, as very often mistakes are found in the data collected, which sometimes presented in an unstructured form. Furthermore, data needed may not be readily available due to rapid changes, thus making the analysis process more difficult. Without the data preparation phase, we will have no idea if the data quality can support analysis queries. Several techniques exist for data preparation in data warehousing (Section II). However, one of the problems of existing approaches is their limited support for data preparation in active and evolving environments; their focus is on static snapshots of data. Data preparation tends to be a one-off task before data analysis. The difficulty is that, the way data is cleansed depends on the intended use expressed as a business purpose for analysis. If the business purpose changes through time then the data preparation process also needs to adapt to change. For example, hospital records may contain sufficient O