International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 08 | Aug-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET ISO 9001:2008 Certified Journal Page 565
A Survey on Distributed Deduplication System for Improved Reliability
and Security
B. Sateesh Kumar
1
, V. Uma Rani
2
, T. Akshay Raj
3
1
Asst. Prof of CSE, JNTUH College of Engineeering, Jagitial, Nachupally, Kodimial, Karimnagar District, Telangana,
India
2
Asst. Prof of CSE, School Of Information Technology-JNTUH, KPHB Village, Kukatpally Mandal, Ranga Reddy
District, Telangana, India
3
M.Tech Student, Department of CSE, School Of Information Technology-JNTUH, KPHB Village, Kukatpally Mandal,
Ranga Reddy District, Telangana, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Data Deduplication is a specialized strategy
for eliminating repeated copies of data, and has been
extensively used in distributed storage to shorten storage
space and save bandwidth. In order to achieve the
confidentiality and tag consistency the concept of
deterministic secret sharing scheme has been proposed as
a replacement to convergent encryption in distributed
storage systems. Data deduplication checks for back-up of
sequences of bytes over a comparison window of definite
size. Certain Sequence of data(about 8Kb) are compared
to the history of added such sequences and it is perfect for
redundant operations like backup where iterative calls are
made for copying and storing the similar data set a couple
of times for data recovery purpose. In order to handle
such attacks, the notion of proofs-of-ownership (PoWs)
has been found, which lets a user effectively prove to a
server that the applicant holds a file.
To reduce the number of bytes dispatched through
network, the idea of data Deduplication can be applied to
network data. In spite of the fact that Data Deduplication
gives a considerable measure of privacy and security
concerns emerges as client’s sensitive information is
vulnerable to both insider and outsider attacks.
Key Words: Data Deduplication, Distributed
Deduplication System, Reliability, Secret Sharing
scheme, Security, Reliability
1. INTRODUCTION
Today, With the Exponential growth in volume of data
over the past few years many organizations are
struggling to manage the data leading to an expensive
problem. For this purpose, Data Deduplication is
considered as the next evolutionary step in the field of
Backup Technology due to its ability to reduce cost of the
storing data. Data Dedeplication(DeDupe) is considered
as DzIntegraldz for all every organizations to remain
competitive by operating efficiently.
The basic idea behind deduplication is to store
repeated copies of data (either blocks or files) only once,
which ultimately helps in improving the search results
much more efficiently and quickly. For Instance,
Consider a typical email system which may contain over
100 instances of same 1 megabyte (MB) file attachment.
When the email platform is archived or backup, all those
100 instances are saved which requires 100 MB of
storage. By applying data deduplication, only single
instance of the attachment is stored actually and the
successive instances are referred back to the one saved
copy. In this case, demand for 100 MB is reduced to only
1 MB
[10]
.
Deduplication can shorten storage utilization by up to 90
to 95% for backup applications
[8]
and up to 68% in
standard file systems
[9]
. Data Deduplication plays a
strategic role of saving on storage costs. It plays a crucial
role in disaster recovery since there is considerably less
data to transfer. In case of backup or archive data
includes lot of redundant data. Similar data is saved
multiple times, consumes unnecessary storage space,
power and bandwidth. Such process creates a chain of
consumes unnecessary storage space, power and
bandwidth. Such process creates a chain of cost and
inefficiency of resources within an organization.
Various deduplication frameworks have been proposed
based on different deduplication methodologies for
example, Server-side deduplication or Client-side
deduplication or block level or file level deduplication.
Deduplication can be applied either at block level or at
file level. In file level deduplication, it takes out repeated
copies of identical file. Deduplication can also be applied
at block level that removes duplicate data blocks present
in non identical files.