International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 84 IMPLEMENTATION OF DE-DUPLICATION ALGORITHM Nirmala Bhadrappa 1 , Dr. G S Mamatha 2 1 Department of ISE, R.V College of Engineering, Bengaluru, India 2 Associate Professor, Dept. of ISE, R.V College of Engineering, Bengaluru, India ---------------------------------------------------------------------***--------------------------------------------------------------------- ABSTRACT: Data which are duplicated can be avoided using data de-duplication technique, and these techniques are used to reduce storage space. This also helps in reducing bandwidth and can be stored on cloud storage. These de-duplications are used to secure the data and have been a challenge for keeping the data securely. To avoid miss handling of data cloud convergent encryption technique is used. The duplication of data can be treated in two different methods. Firstly we will have to address the problem proficiently and should handle a huge count of convergent keys. Secondly data resource raises i.e. security and privacy. A third-party cloud service is proposed for confidentiality of data; reliability checking by access control mechanisms can be done both internal and external. As the duplication technique improves storage space, bandwidth and efficiency will be a conflict with the convergent encryption technique. So the convergent encryption technique requires key for their respective data to encrypt. The copies of same data and will be checked for data feasibility. Convergent encryption helps in encrypting and decrypting the data using a key guaranteeing same data to be duplicated on to itself. The key generation and data encryption technique helps to hold the key and send the cipher text to cloud service provider. Thus encrypting technique is used to determine identical copies and also to create similar key and the identical cipher text hence data is stored secured and only authorized user can access the information from the cloud service provider. KEYWORDS: data de-duplication, convergent encryption, de-duplication efficiency, SHA algorithm 1. INTRODUCTION As we move forward with the development of enterprise data accelerates, the task of protecting and de-duplication becomes more challenging. The individualized computing frameworks like desktops, portable PCs, tablets, advanced mobile phones have turned out to be significant stages for various users, increasing the significance of data on these gadgets. We may lose data because of system failure or at times the data might be erased consequently or we may lose the data by losing the gadget or be lost by looting of gadget, yet people have enhanced the utilization of data protection and recovery tool in their individualized computing gadgets. Storage resources like Amazon S3 and Google storage take economic advantages to store the data on the cloud storage for users. Figure1. Replicates the data reinforcement for individual stockpiling, these have been outsourced so clients can oversee information much effectively without bothering about keeping up about the reinforcement. The clouds are always centralized so that they can be easily managed efficiently and disaster free. The clouds offer offsite storage for data backup. The data reinforcement for individual stockpiling in the cloud demonstrates a geographic division between customer and the service provider. In de- duplication the data redundancy is removed by generating hash key to the respective file and later these file is divided into smaller parts based on the number of lines in the file. This smaller parts can also be called has blocks. Hash key is also created to these blocks for de-duplication check. Fig-1: Shows cloud backup platform The cloud concepts can be understood in more detail from the below: 1.1 DE-DUPLICATION Data de-duplication is a technique of finding duplication of data in storage space. This technique used to improve bandwidth and utilization of storage that can also be used for data transfers over network and to decrease the number of bytes and file size. Data de-duplication technique identifies and removes the data which are not unique. Whenever any similar data match occurs, they are copied with a small reference. Based on file content or file name data de- duplication is performed. Data de-duplication consists of following steps: Step 1: Divide the input file into blocks based on number of lines in the file. Step 2: Hash key value is generated for each block of file.