A Multi-Dimensional Information Quality Framework to Enhance the Accuracy of Business Intelligence Applications Mona Nasr, Essam Shaaban, Menna Gabr Abstract – Data preprocessing is a crucial step through which the data can be cleaned from any quality defects. Quality defects include catching duplicates, filling missing values, removing irrelevant features, catching outliers and other defects. This paper presents a multi-dimensional information quality framework that enhances the accuracy of business intelligence applications by eliminating quality issues in the input data. The results declared that our framework enhances the quality of the data and works effectively. Index Terms - data quality, quality dimensions, data cleansing, missing values, feature selection, duplication, business intelligence, quality framework. —————————— u —————————— 1. INTRODUCTION he use of poor-quality data, having missing and incorrect values, can result in an inaccurate and non-sensible conclusion, making the whole process of data collection and analysis useless for the users. Therefore, in order to deal with the inaccurate and missing values, it is extremely important to have an effective data preprocessing framework [1]. Traditionally, it has been well known that problems related to data quality, such as incomplete, redundant, inconsistent, and noisy data pose a major challenge to data mining and data analysis. In fact, one of the most important steps in data mining is considered to be the data preparation step, which is the process of ensuring the quality of data by changing the original data into a suitable format for the analysis process [2]. As information is a vital asset for any business, so the information must be tested against any data quality defects to ensure its effectiveness for use, this assessment is happening in data cleansing step. Data cleansing is a critical step in which data quality assessment is done to remove the quality issues to ensure the high quality of the used data. Data quality refers to how relevant, precise, useful, in context, understandable and timely data is. Data is considered to be of high quality if it satisfies the requirements stated in a particular specification and the specification reflects the implied need of the user [3]. In another way data quality is often defined as 'fitness for use', i.e. an evaluation of to which extent some data serve the purposes of the user [4]. The term data quality is clearly defined and tested through data quality dimensions. Too many data quality dimensions are stated here [5], [6]. For the purpose of this paper we only focus on the Completeness, Relevance, and Duplication dimensions. A simple definition for each quality dimension according to our scope is presented next. 1. Completeness means the extent to which data is not missing and is of sufficient breadth and depth for the task at hand. T ———————————————— • Mona Mohamed Nasr is Associate Professor, Information Systems Department Faculty of Computers and Information Helwan University E- mail: m.nasr@helwan.edu.eg • Essam Mohamed Shaaban is Assistant Professor, Information Systems Department Faculty of Computers and Information Beni-Suef University E- mail: essam.shaban@fcis.bsu.edu.eg • Menna Ibrahim Gabr is Teaching Assistant, Information System Department Faculty of Business Information Systems Helwan University E-mail: Menna.ibrahim@commerce.helwan.edu.eg International Journal of Scientific & Engineering Research, Volume 8, Issue 11, November-2017 ISSN 2229-5518 609 IJSER © 2017 http://www.ijser.org IJSER