International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 08 | Aug-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET ISO 9001:2008 Certified Journal Page 565 A Survey on Distributed Deduplication System for Improved Reliability and Security B. Sateesh Kumar 1 , V. Uma Rani 2 , T. Akshay Raj 3 1 Asst. Prof of CSE, JNTUH College of Engineeering, Jagitial, Nachupally, Kodimial, Karimnagar District, Telangana, India 2 Asst. Prof of CSE, School Of Information Technology-JNTUH, KPHB Village, Kukatpally Mandal, Ranga Reddy District, Telangana, India 3 M.Tech Student, Department of CSE, School Of Information Technology-JNTUH, KPHB Village, Kukatpally Mandal, Ranga Reddy District, Telangana, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Data Deduplication is a specialized strategy for eliminating repeated copies of data, and has been extensively used in distributed storage to shorten storage space and save bandwidth. In order to achieve the confidentiality and tag consistency the concept of deterministic secret sharing scheme has been proposed as a replacement to convergent encryption in distributed storage systems. Data deduplication checks for back-up of sequences of bytes over a comparison window of definite size. Certain Sequence of data(about 8Kb) are compared to the history of added such sequences and it is perfect for redundant operations like backup where iterative calls are made for copying and storing the similar data set a couple of times for data recovery purpose. In order to handle such attacks, the notion of proofs-of-ownership (PoWs) has been found, which lets a user effectively prove to a server that the applicant holds a file. To reduce the number of bytes dispatched through network, the idea of data Deduplication can be applied to network data. In spite of the fact that Data Deduplication gives a considerable measure of privacy and security concerns emerges as client’s sensitive information is vulnerable to both insider and outsider attacks. Key Words: Data Deduplication, Distributed Deduplication System, Reliability, Secret Sharing scheme, Security, Reliability 1. INTRODUCTION Today, With the Exponential growth in volume of data over the past few years many organizations are struggling to manage the data leading to an expensive problem. For this purpose, Data Deduplication is considered as the next evolutionary step in the field of Backup Technology due to its ability to reduce cost of the storing data. Data Dedeplication(DeDupe) is considered as ǲIntegralǳ for all every organizations to remain competitive by operating efficiently. The basic idea behind deduplication is to store repeated copies of data (either blocks or files) only once, which ultimately helps in improving the search results much more efficiently and quickly. For Instance, Consider a typical email system which may contain over 100 instances of same 1 megabyte (MB) file attachment. When the email platform is archived or backup, all those 100 instances are saved which requires 100 MB of storage. By applying data deduplication, only single instance of the attachment is stored actually and the successive instances are referred back to the one saved copy. In this case, demand for 100 MB is reduced to only 1 MB [10] . Deduplication can shorten storage utilization by up to 90 to 95% for backup applications [8] and up to 68% in standard file systems [9] . Data Deduplication plays a strategic role of saving on storage costs. It plays a crucial role in disaster recovery since there is considerably less data to transfer. In case of backup or archive data includes lot of redundant data. Similar data is saved multiple times, consumes unnecessary storage space, power and bandwidth. Such process creates a chain of consumes unnecessary storage space, power and bandwidth. Such process creates a chain of cost and inefficiency of resources within an organization. Various deduplication frameworks have been proposed based on different deduplication methodologies for example, Server-side deduplication or Client-side deduplication or block level or file level deduplication. Deduplication can be applied either at block level or at file level. In file level deduplication, it takes out repeated copies of identical file. Deduplication can also be applied at block level that removes duplicate data blocks present in non identical files.