International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 84
IMPLEMENTATION OF DE-DUPLICATION ALGORITHM
Nirmala Bhadrappa
1
, Dr. G S Mamatha
2
1
Department of ISE, R.V College of Engineering, Bengaluru, India
2
Associate Professor, Dept. of ISE, R.V College of Engineering, Bengaluru, India
---------------------------------------------------------------------***---------------------------------------------------------------------
ABSTRACT: Data which are duplicated can be avoided using
data de-duplication technique, and these techniques are used
to reduce storage space. This also helps in reducing bandwidth
and can be stored on cloud storage. These de-duplications are
used to secure the data and have been a challenge for keeping
the data securely. To avoid miss handling of data cloud
convergent encryption technique is used. The duplication of
data can be treated in two different methods. Firstly we will
have to address the problem proficiently and should handle a
huge count of convergent keys. Secondly data resource raises
i.e. security and privacy. A third-party cloud service is
proposed for confidentiality of data; reliability checking by
access control mechanisms can be done both internal and
external. As the duplication technique improves storage space,
bandwidth and efficiency will be a conflict with the convergent
encryption technique. So the convergent encryption technique
requires key for their respective data to encrypt. The copies of
same data and will be checked for data feasibility. Convergent
encryption helps in encrypting and decrypting the data using a
key guaranteeing same data to be duplicated on to itself. The
key generation and data encryption technique helps to hold
the key and send the cipher text to cloud service provider. Thus
encrypting technique is used to determine identical copies and
also to create similar key and the identical cipher text hence
data is stored secured and only authorized user can access the
information from the cloud service provider.
KEYWORDS: data de-duplication, convergent
encryption, de-duplication efficiency, SHA algorithm
1. INTRODUCTION
As we move forward with the development of enterprise
data accelerates, the task of protecting and de-duplication
becomes more challenging. The individualized computing
frameworks like desktops, portable PCs, tablets, advanced
mobile phones have turned out to be significant stages for
various users, increasing the significance of data on these
gadgets. We may lose data because of system failure or at
times the data might be erased consequently or we may lose
the data by losing the gadget or be lost by looting of gadget,
yet people have enhanced the utilization of data protection
and recovery tool in their individualized computing gadgets.
Storage resources like Amazon S3 and Google storage take
economic advantages to store the data on the cloud storage
for users. Figure1. Replicates the data reinforcement for
individual stockpiling, these have been outsourced so clients
can oversee information much effectively without bothering
about keeping up about the reinforcement. The clouds are
always centralized so that they can be easily managed
efficiently and disaster free. The clouds offer offsite storage
for data backup. The data reinforcement for individual
stockpiling in the cloud demonstrates a geographic division
between customer and the service provider. In de-
duplication the data redundancy is removed by generating
hash key to the respective file and later these file is divided
into smaller parts based on the number of lines in the file.
This smaller parts can also be called has blocks. Hash key is
also created to these blocks for de-duplication check.
Fig-1: Shows cloud backup platform
The cloud concepts can be understood in more detail from
the below:
1.1 DE-DUPLICATION
Data de-duplication is a technique of finding duplication of
data in storage space. This technique used to improve
bandwidth and utilization of storage that can also be used for
data transfers over network and to decrease the number of
bytes and file size. Data de-duplication technique identifies
and removes the data which are not unique. Whenever any
similar data match occurs, they are copied with a small
reference. Based on file content or file name data de-
duplication is performed.
Data de-duplication consists of following steps:
Step 1: Divide the input file into blocks based on number of
lines in the file.
Step 2: Hash key value is generated for each block of file.