Engineering Applications of Artificial Intelligence 94 (2020) 103813
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence
journal homepage: www.elsevier.com/locate/engappai
Differentially private 1R classification algorithm using artificial bee colony
and differential evolution
Ezgi Zorarpacı
a,∗
, Selma Ayşe Özel
b
a
Department of Computer Engineering, Iskenderun Technical University, Hatay, Turkey
b
Department of Computer Engineering, Cukurova University, Adana, Turkey
ARTICLE INFO
Keywords:
Artificial bee colony
Differential evolution
Privacy preserving data mining
One Rule
Differential privacy
Feature selection
ABSTRACT
Classification is an important topic in data mining field. Privacy preserving classification is a substantial
subtopic that aims to perform classification of private data with satisfactory accuracy, while allowing sensitive
information leakage at minimal level. Differential privacy is a strong privacy guarantee that determines privacy
leakage ratio by using parameter; and enables privacy of individuals whose sensitive data are stored in a
database. There exist some differentially private implementations of well-known classification algorithms such
as ID3, random tree, random forests, Naïve Bayes, SVM, logistic regression, k-NN etc. Although One Rule (1R)
is a simple but powerful classification algorithm, any implementation of differentially private 1R classification
algorithm has not been proposed in the literature to our best knowledge. Motivated by this gap, first we
propose a differentially private 1R classification algorithm (DP1R), then improve its performance by using
metaheuristics that are differential evolution (DE) and artificial bee colony (ABC) in this study. Additionally,
we also apply DE and ABC to improve performance of differentially private Naïve Bayes classifier and compare
with DP1R. Moreover, DP1R is compared with the state-of-the-art differentially private algorithms such as
differentially private SVM, differentially private logistic regression, differentially private ID3, and differentially
private random tree on nine publicly available UCI datasets. The experimental results demonstrate that DP1R
is an efficient classifier that has very similar accuracy to differentially private SVM which has the best accuracy
results, however with respect to running time comparison of the methods, DP1R has the best performance
among the all methods.
1. Introduction
Classification, which is one of the most important topics in data
mining field, can be defined as the task of assigning predefined class
labels to previously unseen data objects according to values of their
features. Classification is a supervised learning method in which a set
of data objects with their associated class labels are used to train a
classifier, then this classifier is applied to new objects to assign class
labels. To train a classification model, several methods including sta-
tistical methods, machine learning based methods, decision tree based
methods, rule based methods, etc. have been proposed. Classification
has many application domains such as patients’ records classification
to diagnose a specific illness, image classification to identify certain ob-
jects, text classification to determine its author, sentiment classification
to analyze reputation of a company, etc.
Privacy preserving data mining is another hot research field for data
mining. The goal of the privacy preserving data mining is to ensure
the privacy of individuals’ data while enabling to perform data mining
techniques. A number of privacy preservation methods, which includes
∗
Corresponding author.
E-mail addresses: ezgi.zorarpaci@iste.edu.tr (E. Zorarpacı), saozel@cu.edu.tr (S.A. Özel).
perturbation of output or data (Adam and Worthmann, 1989), secure
multiparty computation (Lindell and Pinkas, 2002), anonymization
techniques (Machanavajjhala et al., 2007; Samarati, 2001), etc., have
been studied for years to analyze sensitive data. Recently, a strong
privacy guarantee, namely differential privacy (DP), that determines
privacy leakage ratio by parameter (Dwork et al., 2006; Dwork,
2008), and enables individuals’ data to be taken safely in a database,
has been proposed. With the recent developments in differential pri-
vacy, differentially private classifiers that are based on k-NN, decision
tree (ID3), random decision trees and forests, Naïve Bayes (NB), and
SVM have been proposed in the literature (Rubinstein et al., 2012;
Vaidya et al., 2013; Friedman and Schuster, 2010; Bojarski et al., 2015;
Fletcher and Islam, 2015, 2019; Gursoy et al., 2017; Zhang et al., 2019).
The aim of differentially private classifiers is to maximize the utility
of data (i.e., good classification accuracy) while minimizing the leakage
of sensitive data during the classification task. To achieve this goal,
differential privacy adds random noise drawn from a distribution such
as Laplace to the output of functions running on sensitive data to
perform classification task (Mivule et al., 2012; Ji et al., 2014; Sánchez
https://doi.org/10.1016/j.engappai.2020.103813
Received 5 August 2019; Received in revised form 21 April 2020; Accepted 9 July 2020
Available online xxxx
0952-1976/© 2020 Elsevier Ltd. All rights reserved.