LogicGAN–based Data Augmentation Approach to Improve Adversarial Attack DNN Classiﬁers Christophe Feltus Luxembourg Institute of Science and Technology (LIST) Maison De L’Innovation, Avenue des Hauts–Fourneaux 5, L–4362 Esch/Alzette, Luxembourg christophe.feltus@list.lu Abstract—This paper presents an innovative algorithmic ap- proach in order to improve adversarial attack classiﬁers, based on data augmented by minor modiﬁcations generated by a logicGAN. Therefore, the paper addresses a particular type of mitigation against adversarial attacks, which consists of training the ”attacked” classiﬁer with initial and adversarial data already known by the defender. Accordingly, we propose an algorithm that improves the training of the classiﬁer: (1) by generating complementary adversarial data which instead of coming from the known adversarial attack, comes directly from minor modiﬁcations resulting from the already known adversarial data, and (2) by generating these minor modiﬁcations using a speciﬁc kind of generative adversarial network named logicGAN. By using an xAI system, this derivative of GAN has the particularity of yielding more substantial corrective feedback from the discriminator to the generator and, thereby, making the mitigation of adversarial attacks faster. Index Terms—Adversarial attack, LogicGAN, Data augmenta- tion, Generative adversarial network, Security, Attack classiﬁer, Adversarial sample, DNN. I. I NTRODUCTION The contribution of artiﬁcial intelligence (AI) and machine learning (ML) to the security of the information system (IS) constitutes an important area of concern for companies. A security infrastructure can no longer be deployed and up- dated without using artiﬁcial intelligence or machine–learning models to continuously update the security tools combating new threats that appear. Ten years ago, many existing security applications were based on multi–layer perceptron (MLP) networks. At that time, these networks were not suitable for processing the raw situation data generated in real–time by the environment. Nowadays, the topologies of the MLP are more sophisticated [1] and have been developed to support the analysis of visual imagery (convolutional neural networks), the recurrent analysis of time series (recurrent neural networks), learning in an unknown environment (reinforcement learning [2], [3]) or the implicit generative models (generative adver- sarial networks – GAN). On the other side of the coin, the development of AI and ML also largely beneﬁts the attackers who redouble their ingenuity to take advantage of the strong potential of AI to implement new attacks. This is especially the case for adversarial attacks, which consist of designing the input of a DNN (deep neural network) classiﬁer in a speciﬁc way so that it outputs a wrong result [4]. This speciﬁc input, able to deceive the classiﬁer, is known as an adversarial example. According to Frosst 1 , an adversarial example is an image intentionally craft to damage up a network after training it. Figure 1 illustrates a classical case of an adversarial example. In [5], a mitigation of adversarial attacks is proposed and consists of training a Defense–GAN to model the distribution of unperturbed data. At the time of inference, the Defense- GAN ﬁnds an output close to a given piece of data which does not contain the adversarial changes. This output is then used to improve the training of the classiﬁer to be protected. The problem with GANs (including the Defense–GAN) is that they are very resource–intensive [6], and that there is only one abstract value of corrective feedback provided by the discriminator to the generator. Recent researches works have proposed logicGAN [6], which advances the state of the art in ﬁeld of corrective feedback by modifying the gradient descent using a dedicated xAI system. This system aims to explain the motivation of the classiﬁcation achieved by the discriminator and thereby, because of a richer interpretation, supports the generator in order to trick the discriminator more effectively. Acknowledging (1) the need to enhance the training of ad- versarial attack DNN classiﬁers with complementary data, and, (2) the difﬁculty of training a GAN for this purpose, in this paper, we present an innovative algorithmic approach to improve adversarial attack classiﬁers based on data augmented by minor modiﬁcations generated by a logicGAN. Therefore, the paper addresses a particular type of mitigation against adversarial attacks, which consists of training the ”attacked” classiﬁer with initial and adversarial data already known by the defender. Accordingly, we propose an algorithm that improves the training of the classiﬁer (1) by generating complementary adversarial data which instead of coming from the known adversarial attack, comes directly from minor mod- iﬁcations resulting from the adversarial data already known, and (2) by generating these minor modiﬁcations using a spe- ciﬁc kind of generative adversarial network named logicGAN [6]. By using an xAI system, this GAN derivative has the particularity of being able to yield more substantial corrective feedback from the discriminator to the generator and, thereby, enabling a faster mitigation of adversarial attacks. This paper is structured as follows: in Section II, we present 1 Nicholas Frosst: Google Brain Research Engineer working on the adver- sarial examples problem with Turing Award winner Geoffrey Hinton’s Toronto team.