Towards Data Quality into the Data Warehouse Development Munawar 1 , Naomie Salim 2 , Roliana Ibrahim 3 Dept of Information System Universiti Teknologi Malaysia Johor Bahru, Malaysia an_moenawar@yahoo.com 1 , naomie@utm.my 2 , roliana@utm.my 3 Abstract—Commonly, DW development methodologies, paying little attention to the problem of data quality and completeness. One of the common mistakes made during the planning of a data warehousing project is to assume that data quality will be addressed during testing. In addition to the data warehouse development methodologies, we will introduce in this paper a new approach to data warehouse development. This proposal will be based on integration data quality into the whole data warehouse development phase, denoted by: integrated requirement analysis for designing data warehouse (IRADAH). This paper shows that data quality is not only an integrated part of data warehouse project, but will remain a sustained and ongoing activity Keywords- Data warehouse, data quality, data quality dimension, data quality integration I. INTRODUCTION Nowadays, organizations are becoming more dependent on data. The massive growth in data volume has created a new problem for the organizations: data quality. Data quality (DQ) issues become more critical when data is transferred from one system or a source system into another system. DQ is no longer an optional characteristic of information systems. DQ is a requirement for effective business performance. DQ issues are the major impediment to success in a variety of information system projects including data warehouse (DW) projects. A DW is a special database used for storing enormous amounts of data, gathered from heterogeneous data sources in order to satisfy decision-making requests [23]. DWs are increasingly being used by many organizations in many sectors to improve their operations and to better achieve their objectives. DW enables executives to access the information they need to make informed business decisions [37]. Therefore DW has to deliver highly aggregated, high quality data from heterogeneous sources to decision makers [14] Issues of DQ in a DW are of great importance [7]. Many firms have problems to ensure DQ [36]. Ensuring high-level DQ is one of the most expensive and time-consuming tasks to perform in data warehousing projects [28]. Many DW projects have failed halfway through due to poor DQ [9]. This is often because DQ problems do not become apparent until the project is underway. One of the common mistakes made during the planning of a data warehousing project is to assume that DQ will be addressed during testing. To make such an assumption will lead to either delay in delivery, or at the extreme total non- delivery of the project. To leave checking of DQ till testing will leave no time to address the issues raised and to put into place corrective measures then still insure that the DW project gets delivered on time [10]. Consequently DW project has a high percentage of project failure [22]. Inappropriate, misunderstood, or ignored DQ has a negative effect on business decisions, performance, and value of DW systems. Poor DQ increases operational cost, adversely affects customer satisfaction and has serious ramifications on the society at large [28]. English (2001) argues that managing quality of information is equally important as managing business [8]. It indicates that DQ issues are critical and need to be addressed in every phase of DW development to assure DQ is not ignored. For that reason, it is essential to start investigating the DQ whilst initiating the DW project. However, the crucial question of defining DQ is often ignored until late in the process. This could be due to the lack of solid methodology to deal with DQ. Due to the strategic importance of DWs, it is absolutely crucial to guarantee their DQ from the early stages of a project [31]. The earlier problems are identified, the sooner specific recommendations can be made to ensure that data is correctly cleansed [10]. Therefore DW success depends on integrating DQ assurance into all warehousing phases [1]. Otherwise, poor DQ will manifest itself throughout the process and in the system itself [4]. Author of [1] address an important issue in warehousing -- that of determining which DQ improvement projects will add the most value to the organization. For an effective DW system, the quality aspects should be incorporated properly at the various levels of development of the system [11] and should cover all of DW development layer [27]. In a DW where data is processed in stages, and where the quality of data at one stage is dependent on the DQ measurements in preceding stages, the use of a framework for managing DQ is very much needed. Within a framework, DQ can be treated as a continuous improvement in the whole phase of DW development. Furthermore, DQ can be assessed and monitored continuously in order to 2011 Ninth IEEE International Conference on Dependable, Autonomic and Secure Computing 978-0-7695-4612-4/11 $26.00 © 2011 IEEE DOI 10.1109/DASC.2011.194 1200 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing 978-0-7695-4612-4/11 $26.00 © 2011 IEEE DOI 10.1109/DASC.2011.194 1200 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing 978-0-7695-4612-4/11 $26.00 © 2011 IEEE DOI 10.1109/DASC.2011.194 1199