Computer Communications 154 (2020) 148–159
Contents lists available at ScienceDirect
Computer Communications
journal homepage: www.elsevier.com/locate/comcom
SPARK: Secure Pseudorandom Key-based Encryption for Deduplicated
Storage
Jay Dave
a,∗
, Parvez Faruki
b
, Vijay Laxmi
a
, Akka Zemmari
c
, Manoj Gaur
d
, Mauro Conti
e,f
a
Malaviya National Institute of Technology Jaipur, India
b
Government MCA College, Ahmedabad, India
c
University of Bordeaux, Talence, France
d
Indian Institute of Technology Jammu, India
e
University of Padua, Italy
f
Delft University of Technology, Netherlands
ARTICLE INFO
Keywords:
Deduplication
Encryption
Dictionary attacks
Tag inconsistency anomaly
ABSTRACT
Deduplication is a widely used technology to reduce the storage and communication cost for cloud storage
services. For any cloud infrastructure, data confidentiality is one of the primary concerns. Data confidentiality
can be achieved via user-side encryption. However, conventional encryption mechanism is at odds with
deduplication. Developing a user-side encryption mechanism with deduplication is a vital research topic.
Existing state-of-the-art solutions in security of deduplication are vulnerable to dictionary attacks and tag
inconsistency anomaly.
In this paper, we present SPARK, a novel approach for secure pseudorandom key-based encryption for
deduplicated storage. SPARK achieves semantic security along with deduplication. Security analysis proves
that SPARK is secure against dictionary attacks and tag inconsistency anomaly. As a proof of concept, we
implement SPARK in realistic environment and demonstrate its efficiency and effectiveness.
1. Introduction
Cloud Storage has become an essential part of various network
applications due to its location independent, low-cost, and scalable
online storage services. Cisco Global Cloud Index foresights that the size
of digital data on cloud storage will increase up to 19.5 Zettabytes in
2021 [1]. Such explosive growth of data on cloud storage promotes the
demand for an approach that reduces the storage cost. Deduplication
is an approach to optimize the utilization of storage resources. In the
following, we discuss the process of deduplication approach.
Deduplication: In this approach, when a user requests data upload,
storage server checks whether this data exists on its storage. Storage
server stores the data only if it is not present. In this way, deduplication
avoids storing multiple copies of the same data. The advantage of
deduplication is measured by space reduction ratio [2]. Space reduction
ratio is (
), the size of input data divided by size of
data to be stored. Hence, deduplication advantage percentage is {1 −
(
1
)} × 100.
Deduplication can be categorized on the basis of (1) Granularity, (2)
Intra-Inter user, (3) Locality, and (4) Architecture as discussed in [3].
Based on granularity, deduplication is further categorized into File
level and Block level deduplication. (i) File level: Storage is scanned
∗
Corresponding author.
E-mail address: jaydaveadms@gmail.com (J. Dave).
filewise to detect the existence of identical file. (ii) Block level: File
is divided into blocks. Storage is scanned block by block. In block
level deduplication, block size can be either fixed size or variable size.
In terms of user levels, deduplication is categorized into Intra user-
Inter user deduplication. (iii) Intra User : Deduplication is applied in
context of the user’s individual data only. (iv) Inter User : Deduplication
is applicable to the data of all users of the storage server. Based on
locality, deduplication is further classified into Client side and Server
side deduplication. (v) Client side: User first sends some tag of data
(i.e., data’s hash value) to the storage server for detecting redundancy.
User sends the data only if it is not present at storage. (vi) Server side:
User is not aware of whether deduplication will take place to her data.
She just outsources the data, and the storage server further executes
deduplication on received data. Communication overhead is higher
in server side deduplication as compared to client side deduplication
because the user needs to send the data in server side deduplication
even if it is present on storage. And finally, based on architecture,
deduplication can be categorized into (vii) Single cloud deduplication
architecture in which a single cloud service provider is considered
as many commercial cloud service provider do, and (viii) multi-cloud
deduplication architecture in which multiple cloud service providers are
considered, and users’ data is split and dispersed across these service
https://doi.org/10.1016/j.comcom.2020.02.037
Received 3 June 2019; Received in revised form 29 December 2019; Accepted 11 February 2020
Available online 13 February 2020
0140-3664/© 2020 Elsevier B.V. All rights reserved.