Citation: Ayoub, S.; Gulzar, Y.; Rustamov, J.; Jabbari, A.; Reegu, F.A.; Turaev, S. Adversarial Approaches to Tackle Imbalanced Data in Machine Learning. Sustainability 2023, 15, 7097. https://doi.org/10.3390/su15097097 Academic Editor: Andreas Kanavos Received: 20 February 2023 Revised: 1 April 2023 Accepted: 13 April 2023 Published: 24 April 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). sustainability Article Adversarial Approaches to Tackle Imbalanced Data in Machine Learning Shahnawaz Ayoub 1 , Yonis Gulzar 2, * , Jaloliddin Rustamov 3 , Abdoh Jabbari 4 , Faheem Ahmad Reegu 4 and Sherzod Turaev 5, * 1 Department of Computer Science and Engineering, Shri Venkateshwara University, NH-24, Venkateshwara Nagar, Gajraula 244236, Uttar Pradesh, India; shahnawazayoub@outlook.com 2 Department of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia 3 Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain 15551, United Arab Emirates 4 Department of Computer Science and Information Technology, Jazan University, Jazan 45142, Saudi Arabia 5 Department of Computer Science & Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain 15551, United Arab Emirates * Correspondence: ygulzar@kfu.edu.sa (Y.G.); sherzod@uaeu.ac.ae (S.T.); Tel.: +966-545-719-118 (Y.G.) Abstract: Real-world applications often involve imbalanced datasets, which have different distribu- tions of examples across various classes. When building a system that requires a high accuracy, the performance of the classifiers is crucial. However, imbalanced datasets can lead to a poor classification performance and conventional techniques, such as synthetic minority oversampling technique. As a result, this study proposed a balance between the datasets using adversarial learning methods such as generative adversarial networks. The model evaluated the effect of data augmentation on both the balanced and imbalanced datasets. The study evaluated the classification performance on three different datasets and applied data augmentation techniques to generate the synthetic data for the minority class. Before the augmentation, a decision tree was applied to identify the classification accuracy of all three datasets. The obtained classification accuracies were 79.9%, 94.1%, and 72.6%. A decision tree was used to evaluate the performance of the data augmentation, and the results showed that the proposed model achieved an accuracy of 82.7%, 95.7%, and 76% on a highly imbalanced dataset. This study demonstrates the potential of using data augmentation to improve the classification performance in imbalanced datasets. Keywords: computer vision; machine learning; deep learning; imbalanced dataset 1. Introduction Any artificial intelligence application is mainly dependent on data [1]. Due to its numerous uses, AI has been incorporated in many areas such as healthcare [25], agri- culture [6,7], multi-class image classification [8], image caption prediction [9], fake image identification [10], and other purposes [1113]. In the majority of real-world classification applications, the training data shows a distribution with a long tail. It means that the training data is spread out. This is because few classes are abundant whereas other classes are limited [14,15]. Over the last several years, the research community has been interested in learning from imbalanced data. Various researchers attempted to solve binary-class imbalanced problems [16]. When various labels are present, the proposed solutions for binary-class problems may not be directly applicable or may perform poorly. Most real- world problems are multi-class problems. Machine learning is a well-known research field in computer science that employs several algorithms to extract useful information from the datasets. However, imbalanced data can lead to biased models [17], which may have nega- tive impacts on marginalized communities and the environment. For example, a biased Sustainability 2023, 15, 7097. https://doi.org/10.3390/su15097097 https://www.mdpi.com/journal/sustainability