A Multi-Dimensional Information Quality
Framework to Enhance the Accuracy of Business
Intelligence Applications
Mona Nasr, Essam Shaaban, Menna Gabr
Abstract – Data preprocessing is a crucial step through which the data can be cleaned from any quality defects. Quality defects
include catching duplicates, filling missing values, removing irrelevant features, catching outliers and other defects. This paper
presents a multi-dimensional information quality framework that enhances the accuracy of business intelligence applications by
eliminating quality issues in the input data. The results declared that our framework enhances the quality of the data and works
effectively.
Index Terms - data quality, quality dimensions, data cleansing, missing values, feature selection, duplication, business intelligence,
quality framework.
—————————— u ——————————
1. INTRODUCTION
he use of poor-quality data, having missing
and incorrect values, can result in an
inaccurate and non-sensible conclusion,
making the whole process of data collection and analysis
useless for the users. Therefore, in order to deal with the
inaccurate and missing values, it is extremely important to
have an effective data preprocessing framework [1].
Traditionally, it has been well known that problems related
to data quality, such as incomplete, redundant, inconsistent,
and noisy data pose a major challenge to data mining and
data analysis. In fact, one of the most important steps in data
mining is considered to be the data preparation step, which
is the process of ensuring the quality of data by changing the
original data into a suitable format for the analysis process
[2].
As information is a vital asset for any business, so the
information must be tested against any data quality defects
to ensure its effectiveness for use, this assessment is
happening in data cleansing step. Data cleansing is a critical
step in which data quality assessment is done to remove the
quality issues to ensure the high quality of the used data.
Data quality refers to how relevant, precise, useful, in
context, understandable and timely data is. Data is
considered to be of high quality if it satisfies the
requirements stated in a particular specification and the
specification reflects the implied need of the user [3]. In
another way data quality is often defined as 'fitness for use',
i.e. an evaluation of to which extent some data serve the
purposes of the user [4].
The term data quality is clearly defined and tested through
data quality dimensions. Too many data quality dimensions
are stated here [5], [6]. For the purpose of this paper we only
focus on the Completeness, Relevance, and Duplication
dimensions. A simple definition for each quality dimension
according to our scope is presented next.
1. Completeness means the extent to which
data is not missing and is of sufficient
breadth and depth for the task at hand.
T
————————————————
• Mona Mohamed Nasr is Associate Professor, Information Systems
Department Faculty of Computers and Information Helwan University E-
mail: m.nasr@helwan.edu.eg
• Essam Mohamed Shaaban is Assistant Professor, Information Systems
Department Faculty of Computers and Information Beni-Suef University E-
mail: essam.shaban@fcis.bsu.edu.eg
• Menna Ibrahim Gabr is Teaching Assistant, Information System Department
Faculty of Business Information Systems Helwan University E-mail:
Menna.ibrahim@commerce.helwan.edu.eg
International Journal of Scientific & Engineering Research, Volume 8, Issue 11, November-2017
ISSN 2229-5518 609
IJSER © 2017
http://www.ijser.org
IJSER