Research Article
Convolutional Neural Network-Based Discriminator for
Outlier Detection
Fahad Alharbi , Khalil El Hindi , Saad Al Ahmadi , and Hussien Alsalamn
Department of Computer Science, College of Computer and Information Sciences, King Saud University,
Riyadh 11543, Saudi Arabia
Correspondence should be addressed to Khalil El Hindi; khindi@ksu.edu.sa
Received 17 March 2020; Revised 26 January 2021; Accepted 20 February 2021; Published 3 March 2021
Academic Editor: Qiangqiang Yuan
Copyright © 2021 Fahad Alharbi et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Noise in training data increases the tendency of many machine learning methods to overfit the training data, which undermines
the performance. Outliers occur in big data as a result of various factors, including human errors. In this work, we present a novel
discriminator model for the identification of outliers in the training data. We propose a systematic approach for creating training
datasets to train the discriminator based on a small number of genuine instances (trusted data). e noise discriminator is a
convolutional neural network (CNN). We evaluate the discriminator’s performance using several benchmark datasets and with
different noise ratios. We inserted random noise in each dataset and trained discriminators to clean them. Different discriminators
were trained using different numbers of genuine instances with and without data augmentation. We compare the performance of
the proposed noise-discriminator method with seven other methods proposed in the literature using several benchmark datasets.
Our empirical results indicate that the proposed method is very competitive to the other methods. It actually outperforms them for
pair noise.
1. Introduction
While the effectiveness of supervised machine learning al-
gorithms relies on the existence of large and high-quality
labeled datasets, it is a time-consuming and challenging
matter to create clean datasets that are free from noise (i.e.,
incorrectly labeled instances) [1, 2]. Outliers (noise and
outlier are used interchangeably in this paper to refer to the
mislabeled instances) occur in real-world datasets for many
reasons that are related to data collection, human errors, and
the widespread use of suboptimal automated processes to
compile large datasets. e aim of this research is to propose
a machine learning method for identifying and eliminating
noise from datasets. We propose a method to train a noise
discriminator (ND). e ND is trained using automatically
generated datasets based on a small number of genuine
instances. e NDs that we propose are CNN classifiers.
Deep learning (DL) models, including CNN, have been
applied with great success in diverse areas with a perfor-
mance that often exceeds the capabilities of human beings
[3, 4]. DL models are particularly valuable in domains where
large amounts of training data are available. However, when
a training dataset’s size increases, so does the likelihood that
it contains outliers, leading ML models to overfit the training
data, thereby undermining the performance [5]. Given the
negative effects of outliers on DL methods, a range of so-
lutions has been identified to mitigate these effects [6].
is research focuses on developing a generalized CNN-
based discriminator for outlier identification. e proposed
method trains the discriminator on a specially built dataset,
generated from a small number of genuine instances. e
discriminator can be used as a preprocessing step to identify
and eliminate outliers prior to their use to train classifiers.
Our proposed method was inspired by the generative
adversarial network (GAN) model [7] that contains a dis-
criminator model that is trained to separate genuine images
from fake images produced by a generator. Similarly, we
build a noise discriminator that can identify outliers based
on preprepared genuine data (noise free). However, unlike
GAN, we do not have a generator model; rather, we
Hindawi
Computational Intelligence and Neuroscience
Volume 2021, Article ID 8811147, 13 pages
https://doi.org/10.1155/2021/8811147