Towards Data Quality into the Data Warehouse Development
Munawar
1
, Naomie Salim
2
, Roliana Ibrahim
3
Dept of Information System
Universiti Teknologi Malaysia
Johor Bahru, Malaysia
an_moenawar@yahoo.com
1
, naomie@utm.my
2
, roliana@utm.my
3
Abstract—Commonly, DW development methodologies, paying
little attention to the problem of data quality and completeness.
One of the common mistakes made during the planning of a
data warehousing project is to assume that data quality will be
addressed during testing. In addition to the data warehouse
development methodologies, we will introduce in this paper a
new approach to data warehouse development. This proposal
will be based on integration data quality into the whole data
warehouse development phase, denoted by: integrated
requirement analysis for designing data warehouse (IRADAH).
This paper shows that data quality is not only an integrated
part of data warehouse project, but will remain a sustained
and ongoing activity
Keywords- Data warehouse, data quality, data quality
dimension, data quality integration
I. INTRODUCTION
Nowadays, organizations are becoming more dependent
on data. The massive growth in data volume has created a
new problem for the organizations: data quality. Data
quality (DQ) issues become more critical when data is
transferred from one system or a source system into another
system. DQ is no longer an optional characteristic of
information systems. DQ is a requirement for effective
business performance. DQ issues are the major impediment
to success in a variety of information system projects
including data warehouse (DW) projects.
A DW is a special database used for storing enormous
amounts of data, gathered from heterogeneous data sources
in order to satisfy decision-making requests [23]. DWs are
increasingly being used by many organizations in many
sectors to improve their operations and to better achieve
their objectives. DW enables executives to access the
information they need to make informed business decisions
[37]. Therefore DW has to deliver highly aggregated, high
quality data from heterogeneous sources to decision makers
[14]
Issues of DQ in a DW are of great importance [7]. Many
firms have problems to ensure DQ [36]. Ensuring high-level
DQ is one of the most expensive and time-consuming tasks
to perform in data warehousing projects [28]. Many DW
projects have failed halfway through due to poor DQ [9].
This is often because DQ problems do not become apparent
until the project is underway.
One of the common mistakes made during the planning
of a data warehousing project is to assume that DQ will be
addressed during testing. To make such an assumption will
lead to either delay in delivery, or at the extreme total non-
delivery of the project. To leave checking of DQ till testing
will leave no time to address the issues raised and to put into
place corrective measures then still insure that the DW
project gets delivered on time [10]. Consequently DW
project has a high percentage of project failure [22].
Inappropriate, misunderstood, or ignored DQ has a
negative effect on business decisions, performance, and
value of DW systems. Poor DQ increases operational cost,
adversely affects customer satisfaction and has serious
ramifications on the society at large [28]. English (2001)
argues that managing quality of information is equally
important as managing business [8]. It indicates that DQ
issues are critical and need to be addressed in every phase of
DW development to assure DQ is not ignored. For that
reason, it is essential to start investigating the DQ whilst
initiating the DW project. However, the crucial question of
defining DQ is often ignored until late in the process. This
could be due to the lack of solid methodology to deal with
DQ.
Due to the strategic importance of DWs, it is absolutely
crucial to guarantee their DQ from the early stages of a
project [31]. The earlier problems are identified, the sooner
specific recommendations can be made to ensure that data is
correctly cleansed [10]. Therefore DW success depends on
integrating DQ assurance into all warehousing phases [1].
Otherwise, poor DQ will manifest itself throughout the
process and in the system itself [4]. Author of [1] address an
important issue in warehousing -- that of determining which
DQ improvement projects will add the most value to the
organization. For an effective DW system, the quality
aspects should be incorporated properly at the various levels
of development of the system [11] and should cover all of
DW development layer [27].
In a DW where data is processed in stages, and where
the quality of data at one stage is dependent on the DQ
measurements in preceding stages, the use of a framework
for managing DQ is very much needed. Within a
framework, DQ can be treated as a continuous improvement
in the whole phase of DW development. Furthermore, DQ
can be assessed and monitored continuously in order to
2011 Ninth IEEE International Conference on Dependable, Autonomic and Secure Computing
978-0-7695-4612-4/11 $26.00 © 2011 IEEE
DOI 10.1109/DASC.2011.194
1200
2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing
978-0-7695-4612-4/11 $26.00 © 2011 IEEE
DOI 10.1109/DASC.2011.194
1200
2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing
978-0-7695-4612-4/11 $26.00 © 2011 IEEE
DOI 10.1109/DASC.2011.194
1199