Computer Communications 154 (2020) 148–159 Contents lists available at ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/comcom SPARK: Secure Pseudorandom Key-based Encryption for Deduplicated Storage Jay Dave a,∗ , Parvez Faruki b , Vijay Laxmi a , Akka Zemmari c , Manoj Gaur d , Mauro Conti e,f a Malaviya National Institute of Technology Jaipur, India b Government MCA College, Ahmedabad, India c University of Bordeaux, Talence, France d Indian Institute of Technology Jammu, India e University of Padua, Italy f Delft University of Technology, Netherlands ARTICLE INFO Keywords: Deduplication Encryption Dictionary attacks Tag inconsistency anomaly ABSTRACT Deduplication is a widely used technology to reduce the storage and communication cost for cloud storage services. For any cloud infrastructure, data confidentiality is one of the primary concerns. Data confidentiality can be achieved via user-side encryption. However, conventional encryption mechanism is at odds with deduplication. Developing a user-side encryption mechanism with deduplication is a vital research topic. Existing state-of-the-art solutions in security of deduplication are vulnerable to dictionary attacks and tag inconsistency anomaly. In this paper, we present SPARK, a novel approach for secure pseudorandom key-based encryption for deduplicated storage. SPARK achieves semantic security along with deduplication. Security analysis proves that SPARK is secure against dictionary attacks and tag inconsistency anomaly. As a proof of concept, we implement SPARK in realistic environment and demonstrate its efficiency and effectiveness. 1. Introduction Cloud Storage has become an essential part of various network applications due to its location independent, low-cost, and scalable online storage services. Cisco Global Cloud Index foresights that the size of digital data on cloud storage will increase up to 19.5 Zettabytes in 2021 [1]. Such explosive growth of data on cloud storage promotes the demand for an approach that reduces the storage cost. Deduplication is an approach to optimize the utilization of storage resources. In the following, we discuss the process of deduplication approach. Deduplication: In this approach, when a user requests data upload, storage server checks whether this data exists on its storage. Storage server stores the data only if it is not present. In this way, deduplication avoids storing multiple copies of the same data. The advantage of deduplication is measured by space reduction ratio [2]. Space reduction ratio is (           ), the size of input data divided by size of data to be stored. Hence, deduplication advantage percentage is {1 − ( 1    )} × 100. Deduplication can be categorized on the basis of (1) Granularity, (2) Intra-Inter user, (3) Locality, and (4) Architecture as discussed in [3]. Based on granularity, deduplication is further categorized into File level and Block level deduplication. (i) File level: Storage is scanned ∗ Corresponding author. E-mail address: jaydaveadms@gmail.com (J. Dave). filewise to detect the existence of identical file. (ii) Block level: File is divided into blocks. Storage is scanned block by block. In block level deduplication, block size can be either fixed size or variable size. In terms of user levels, deduplication is categorized into Intra user- Inter user deduplication. (iii) Intra User : Deduplication is applied in context of the user’s individual data only. (iv) Inter User : Deduplication is applicable to the data of all users of the storage server. Based on locality, deduplication is further classified into Client side and Server side deduplication. (v) Client side: User first sends some tag of data (i.e., data’s hash value) to the storage server for detecting redundancy. User sends the data only if it is not present at storage. (vi) Server side: User is not aware of whether deduplication will take place to her data. She just outsources the data, and the storage server further executes deduplication on received data. Communication overhead is higher in server side deduplication as compared to client side deduplication because the user needs to send the data in server side deduplication even if it is present on storage. And finally, based on architecture, deduplication can be categorized into (vii) Single cloud deduplication architecture in which a single cloud service provider is considered as many commercial cloud service provider do, and (viii) multi-cloud deduplication architecture in which multiple cloud service providers are considered, and users’ data is split and dispersed across these service https://doi.org/10.1016/j.comcom.2020.02.037 Received 3 June 2019; Received in revised form 29 December 2019; Accepted 11 February 2020 Available online 13 February 2020 0140-3664/© 2020 Elsevier B.V. All rights reserved.