A Privacy-Preserving Multi-keyword Ranked Search Scheme over Encrypted Cloud Data using MIR-tree Sonu Pratap Singh Gurjar Computer Science And Engineering Indian Institute Of Technology (BHU) Varanasi, Uttar Pradesh, India Email: spsingh.gurjar.cse13@itbhu.ac.in Syam Kumar Pasupuleti IEEE Member, IDRBT Hyderabad, Telangana, India Email: psyamkumar@idrbt@ac.in Abstract—With increasing popularity of cloud computing, the data owners are motivated to outsource their sensitive data to cloud servers for flexibility and reduced cost in data management. However, privacy is a big concern for outsourcing data to the cloud. The data owners typically encrypt documents before outsourcing for privacy-preserving. As the volume of data is increasing at a dramatic rate, it is essential to develop an efficient and reliable ciphertext search techniques, so that data owners can easily access and update cloud data. In this paper, we propose a privacy preserving multi-keyword ranked search scheme over encrypted data in cloud along with data integrity using a new authenticated data structure MIR-tree. The MIR- tree based index with including the combination of widely used vector space model and TF×IDF model in the index construction and query generation. We use inverted file index for storing word-digest, which provides efficient and fast relevance between the query and cloud data. Design an authentication set(AS) for authenticating the queries, for verifying top-k search results. Because of tree based index, our scheme achieves optimal search efficiency and reduces communication overhead for verifying the search results. The analysis shows security and efficiency of our scheme. Index Terms—Cloud computing, MIR-tree, multi-keyword ranked search, data integrity, privacy-preserving, trapdoor. I. I NTRODUCTION C LOUD Computing is one of the most popular technology among the new emerging technologies in field of inter- net and computers. Cloud computing environment provides huge resources of computing, storage and ease of accessing data in a most secured way with great efficiency and less operating cost [1]. So, enterprises and data owners usually choose to outsource their data to cloud in order to avoid data management at locally. As a result, more people are moving their data to the cloud. Despite the benefits, privacy is a big concern for cloud storage. Although cloud service providers provide strong security mechanism, but there are always chances of leakage of confidential data(for example personal includes emails, tread secrets etc.) or intruders may access users’ data without authorization [2]. Privacy preserving and secure storage are two main concerns about outsourcing data on cloud [3]. One of the popular way to ensure privacy preserving is to encrypt the data before outsourcing [4]. Sometimes, data owners may share their data with authorized users, and users retrieve the data files from the cloud based on the keyword- search techniques. But, keyword search on ciphertext is a challenging task because of limited operations on ciphertext. Also, it is preferred to get most relevant files which user need, so searched files should be ranked in order of text relevance. Only top relevant files are sent back to user for fast and accurate results. In the recent years, several researchers have proposed many keyword-based search scheme over ciphertext [6], [8], [9], [10], [11], [17], [18]. But, these methods lack data integrity, i.e., a query result should indeed generate from the outsourced data (the authenticity requirement) and contains all the data satisfying the query (the correctness requirement). Because sometimes, due to server failures and storage corruption by an intruder, the server may return wrong search results. The schemes [13], [14], [15] proposed to achieve the data integrity along with privacy. However these schemes take much verifi- cation time, which leads to more communication cost. Thus, an efficient and low cost verifiable mechanism should be provided for users to verify the integrity of the search results. In this paper, we propose a solution to address above problems, by considering a secure MIR-tree [12] based multi- keyword search over the encrypted data and search result verification. In our model, we create a MIR-tree-based index, which provide authenticate text relevance by using the word digests. This construction uses word digests including term frequency (TF) and inverse document frequency (IDF) ,and vector space model is used for multi-keyword search and query generation Then, Greedy Depth-first Search (GDFS) algorithm is used on the MIR index tree for fast search and construct an authentication set for search result verification. Our contributions are summarized as follows: • In our scheme, we designed a MIR-tree based index to achieve efficient search. • We also ensure data integrity by taking measures on completeness, correctness and freshness of data, and also, we design an authentication set (AS) for verifying top-k results. • The security and performance analysis shows that our approach is secure and efficient. The rest of this paper is organized as follows: Section II provide Related Work, then we have given a brief introduc-