Vol 8. No. 1 Issue 2 – May, 2015
African Journal of Computing & ICT
© 2015 Afr J Comp & ICT – All Rights Reserved - ISSN 2006-1781
www.ajocict.net
39
An Improved Technique for the Removal and Replacement of the
Inconsistencies in Numeric Dataset
J. Abdul-Hadi
Department of Mathematics,
Bauchi State University, Gadau , Nigeria.
jamcy98@gmail.com
A.R. Ajiboye
Department of Computer Science,
University of Ilorin, Ilorin, Nigeria.
ajibabdulraheem@gmail.com
A. Abba,
Department of Statistics,
Abubakar Tafawa Balewa University, Bauchi, Nigeria.
abdulhafeezabba@gmail.com
ABSTRACT
The task of ensuring the removal of anomalies in an unclean numeric dataset, with a view to putting the data in a suitable format
for exploration purposes is a major phase in the data mining process. In the process of exploring an unclean numeric dataset to
unveil their useful patterns or structure, a thorough pre-processing task is inevitable in order to achieve a noise-free dataset. Poor
quality data can be misleading if analysed or used to build models, hence, there is need to remove discrepancies that may be
present in the data prior to exploring them. In this paper, a cleaning algorithm is proposed and implemented in order to remove
the inconsistencies in a numeric dataset. The implementation of the proposed algorithm uses the Java language and the resulting
outputs reveal the efficiency of the proposed approach. In order to evaluate the effectiveness of the proposed algorithm, it is
compared to one of the existing methods based on some metrics. The comparisons show that, the proposed technique is efficient
and can be used as an alternative technique for the removal of outliers in numeric data. This approach is also found to be reliable
as it consistently gives an accurate output that is free of outliers.
Keywords: Data cleansing, Data mining, Outlier detection, Clustering.
African Journal of Computing & ICT Reference Format:
J. Abdul-Hadi., A.R. Ajiboye & A. Abba (2015): An Improved Technique for the Removal and Replacement of the Inconsistencies in Numeric
Dataset. Afr J. of Comp & ICTs. Vol 8, No. 1, Issue 1. Pp 39-44
1. INTRODUCTION
Pre-processing is the task performed on the dataset in order
to make it suitable for exploration. Data cleansing, data
cleaning and data scrubbing are sometimes used
interchangeably to describe the pre-processing task of
putting the data in a clean state [1]. The real world data are
sometimes incomplete or noisy and it is very rare to get a
perfect data. Exploration or analysis of unclean dataset has
every tendency to give a result that deviate slightly from
what supposed to be the actual results. This is because the
presence of anomalies in the data is capable of influencing
the resulting outputs. As reported in [2], the use of quality
data is crucial to getting high-quality patterns.
Putting several files together can ease exploration processes,
as efforts to reveal the patterns and structure of the data
would be more focused on a single database. However,
integration of files from different sources is prone to
duplication of records, and human errors in the course of
entering data may sometimes violate the declared integrity
constraints [3]. Some of the basic tasks that is performed in
the process of preparing data generally involves correcting
any errors typically emanates from human and/or machine
input, filling in nulls and incomplete data. Manually filling
of the missing value would, however, cause monotonous
within a very short time, which may also lead to some new
errors.