De-Identification of Textual Data using Immune System for Privacy Preserving in Big Data Amine Rahmani 1 GeCoDe laboratory, department of informatics sciences, Dr. Tahar Moulay university of Saida aminerahmani2091@gmail.com Abdelmalek Amine 2 GeCoDe laboratory, department of informatics sciences, Dr. Tahar Moulay university of Saida amine_abd1@yahoo.fr Mohamed Reda Hamou 3 GeCoDe laboratory, department of informatics sciences, Dr. Tahar Moulay university of Saida hamoureda@yahoo.fr Abstract—With the growing observed success of big data use, many challenges appeared. Timeless, scalability and privacy are the main problems that researchers attempt to figure out. Privacy preserving is now a highly active domain of research, many works and concepts had seen the light within this theme. One of these concepts is the de- identification techniques. De-identification is a specific area that consists of finding and removing sensitive information either by replacing it, encrypting it or adding a noise to it using several techniques such as cryptography and data mining. In this report, we present a new model of de-identification of textual data using a specific Immune System algorithm known as CLONALG. Keywords—de-identification, privacy preserving, big data, immune systems, CLONALG I. INTRODUCTION. One of the advantages of big data’s services is the ability of sharing and publish data over the network. Those data can be sorted in two major categories: normal like books and other textual documents, and sensitive information such as names, medical books, and social information generally. Those last requires a high tier of protection for its importance and sensitivity because if it will be linked together, it forms a total or partial presentation of their owner; which leads to identify him even if this data do not contain any explicit identifiers. The aggregation of this information can presents a unique identity of the person as like as the fingerprint. In addition, the data, once are stored on the web, it becomes accessible and treatable by a third party and, therefore, by other people who shared the same resources which make the privacy an essential aim to ensure. That's what gives birth to a new domain known as Privacy Preserving Data Publishing (PPDP) which offers a set of methods and techniques for protection of users’ privacy. Many deeds are performed within this arena and a lot of approaches are published and used for that, these approaches can be covered on three essential groups: • Heuristic based approaches in which a set of works are done using data mining algorithms in the form of adaptive modification of selected data. This is based on the fact that the selective data modification is an NP-hard problem so that this group of methods is addressed to the complex problems. • Cryptography based approaches that are represented by a secure multiparty computation where the privacy is guaranteed basing on a probabilistic function in order to ensure that at the end for multiparty computations neither party can knows except its own input and the final results of computation. • Perturbation and re-construction of data in which the proposed approaches consist of ensuring data by re-constructing randomly the distribution of data on such aggregated level. One of the techniques of PPDP is the de- identification in which such system consists to detect and remove any information leads to the individuality of such user through his own data. In this work we propose a new approach based on Immune system in order to ensure privacy by detecting and modifying the information leading to identity of users so that we start, in the rest of the paper, with a presentation of basic concepts such as PPDP and its techniques focusing on de-identification and modification technique. Then we pass to the presentation of our idea and its results. And finally, we finished with the discussion of results and the final conclusion. II. BASIC CONCEPTS A. Privacy preserving data publishing A data publisher is typically a data collector that consists in collecting data from Different sources, then pass it to a data miner or publish it to the public which can include an attacker. The Fig 1 shows the point of view of (Fung, Wang, Chen & YU, 10) about a data publisher. 2015 IEEE International Conference on Computational Intelligence & Communication Technology 978-1-4799-6023-1/15 $31.00 © 2015 IEEE DOI 10.1109/CICT.2015.146 112