Research Article Convolutional Neural Network-Based Discriminator for Outlier Detection Fahad Alharbi , Khalil El Hindi , Saad Al Ahmadi , and Hussien Alsalamn Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia Correspondence should be addressed to Khalil El Hindi; khindi@ksu.edu.sa Received 17 March 2020; Revised 26 January 2021; Accepted 20 February 2021; Published 3 March 2021 Academic Editor: Qiangqiang Yuan Copyright © 2021 Fahad Alharbi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Noise in training data increases the tendency of many machine learning methods to overﬁt the training data, which undermines the performance. Outliers occur in big data as a result of various factors, including human errors. In this work, we present a novel discriminator model for the identiﬁcation of outliers in the training data. We propose a systematic approach for creating training datasets to train the discriminator based on a small number of genuine instances (trusted data). e noise discriminator is a convolutional neural network (CNN). We evaluate the discriminator’s performance using several benchmark datasets and with diﬀerent noise ratios. We inserted random noise in each dataset and trained discriminators to clean them. Diﬀerent discriminators were trained using diﬀerent numbers of genuine instances with and without data augmentation. We compare the performance of the proposed noise-discriminator method with seven other methods proposed in the literature using several benchmark datasets. Our empirical results indicate that the proposed method is very competitive to the other methods. It actually outperforms them for pair noise. 1. Introduction While the eﬀectiveness of supervised machine learning al- gorithms relies on the existence of large and high-quality labeled datasets, it is a time-consuming and challenging matter to create clean datasets that are free from noise (i.e., incorrectly labeled instances) [1, 2]. Outliers (noise and outlier are used interchangeably in this paper to refer to the mislabeled instances) occur in real-world datasets for many reasons that are related to data collection, human errors, and the widespread use of suboptimal automated processes to compile large datasets. e aim of this research is to propose a machine learning method for identifying and eliminating noise from datasets. We propose a method to train a noise discriminator (ND). e ND is trained using automatically generated datasets based on a small number of genuine instances. e NDs that we propose are CNN classiﬁers. Deep learning (DL) models, including CNN, have been applied with great success in diverse areas with a perfor- mance that often exceeds the capabilities of human beings [3, 4]. DL models are particularly valuable in domains where large amounts of training data are available. However, when a training dataset’s size increases, so does the likelihood that it contains outliers, leading ML models to overﬁt the training data, thereby undermining the performance [5]. Given the negative eﬀects of outliers on DL methods, a range of so- lutions has been identiﬁed to mitigate these eﬀects [6]. is research focuses on developing a generalized CNN- based discriminator for outlier identiﬁcation. e proposed method trains the discriminator on a specially built dataset, generated from a small number of genuine instances. e discriminator can be used as a preprocessing step to identify and eliminate outliers prior to their use to train classiﬁers. Our proposed method was inspired by the generative adversarial network (GAN) model [7] that contains a dis- criminator model that is trained to separate genuine images from fake images produced by a generator. Similarly, we build a noise discriminator that can identify outliers based on preprepared genuine data (noise free). However, unlike GAN, we do not have a generator model; rather, we Hindawi Computational Intelligence and Neuroscience Volume 2021, Article ID 8811147, 13 pages https://doi.org/10.1155/2021/8811147