Engineering Applications of Artificial Intelligence 94 (2020) 103813 Contents lists available at ScienceDirect Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai Differentially private 1R classification algorithm using artificial bee colony and differential evolution Ezgi Zorarpacı a, , Selma Ayşe Özel b a Department of Computer Engineering, Iskenderun Technical University, Hatay, Turkey b Department of Computer Engineering, Cukurova University, Adana, Turkey ARTICLE INFO Keywords: Artificial bee colony Differential evolution Privacy preserving data mining One Rule Differential privacy Feature selection ABSTRACT Classification is an important topic in data mining field. Privacy preserving classification is a substantial subtopic that aims to perform classification of private data with satisfactory accuracy, while allowing sensitive information leakage at minimal level. Differential privacy is a strong privacy guarantee that determines privacy leakage ratio by using parameter; and enables privacy of individuals whose sensitive data are stored in a database. There exist some differentially private implementations of well-known classification algorithms such as ID3, random tree, random forests, Naïve Bayes, SVM, logistic regression, k-NN etc. Although One Rule (1R) is a simple but powerful classification algorithm, any implementation of differentially private 1R classification algorithm has not been proposed in the literature to our best knowledge. Motivated by this gap, first we propose a differentially private 1R classification algorithm (DP1R), then improve its performance by using metaheuristics that are differential evolution (DE) and artificial bee colony (ABC) in this study. Additionally, we also apply DE and ABC to improve performance of differentially private Naïve Bayes classifier and compare with DP1R. Moreover, DP1R is compared with the state-of-the-art differentially private algorithms such as differentially private SVM, differentially private logistic regression, differentially private ID3, and differentially private random tree on nine publicly available UCI datasets. The experimental results demonstrate that DP1R is an efficient classifier that has very similar accuracy to differentially private SVM which has the best accuracy results, however with respect to running time comparison of the methods, DP1R has the best performance among the all methods. 1. Introduction Classification, which is one of the most important topics in data mining field, can be defined as the task of assigning predefined class labels to previously unseen data objects according to values of their features. Classification is a supervised learning method in which a set of data objects with their associated class labels are used to train a classifier, then this classifier is applied to new objects to assign class labels. To train a classification model, several methods including sta- tistical methods, machine learning based methods, decision tree based methods, rule based methods, etc. have been proposed. Classification has many application domains such as patients’ records classification to diagnose a specific illness, image classification to identify certain ob- jects, text classification to determine its author, sentiment classification to analyze reputation of a company, etc. Privacy preserving data mining is another hot research field for data mining. The goal of the privacy preserving data mining is to ensure the privacy of individuals’ data while enabling to perform data mining techniques. A number of privacy preservation methods, which includes Corresponding author. E-mail addresses: ezgi.zorarpaci@iste.edu.tr (E. Zorarpacı), saozel@cu.edu.tr (S.A. Özel). perturbation of output or data (Adam and Worthmann, 1989), secure multiparty computation (Lindell and Pinkas, 2002), anonymization techniques (Machanavajjhala et al., 2007; Samarati, 2001), etc., have been studied for years to analyze sensitive data. Recently, a strong privacy guarantee, namely differential privacy (DP), that determines privacy leakage ratio by parameter (Dwork et al., 2006; Dwork, 2008), and enables individuals’ data to be taken safely in a database, has been proposed. With the recent developments in differential pri- vacy, differentially private classifiers that are based on k-NN, decision tree (ID3), random decision trees and forests, Naïve Bayes (NB), and SVM have been proposed in the literature (Rubinstein et al., 2012; Vaidya et al., 2013; Friedman and Schuster, 2010; Bojarski et al., 2015; Fletcher and Islam, 2015, 2019; Gursoy et al., 2017; Zhang et al., 2019). The aim of differentially private classifiers is to maximize the utility of data (i.e., good classification accuracy) while minimizing the leakage of sensitive data during the classification task. To achieve this goal, differential privacy adds random noise drawn from a distribution such as Laplace to the output of functions running on sensitive data to perform classification task (Mivule et al., 2012; Ji et al., 2014; Sánchez https://doi.org/10.1016/j.engappai.2020.103813 Received 5 August 2019; Received in revised form 21 April 2020; Accepted 9 July 2020 Available online xxxx 0952-1976/© 2020 Elsevier Ltd. All rights reserved.