Journal of Information Security and Applications 47 (2019) 335–352 Contents lists available at ScienceDirect Journal of Information Security and Applications journal homepage: www.elsevier.com/locate/jisa Secure string pattern query for open data initiative Yong Wang a , Abdelrhman Hassan a, , Fei Liu a , Yuanfeng Guan b , Zhiwei Zhang b a School of Computer Science and Engineering, Center for Cyber Security, University of Electronic Science and Technology of China, Chengdu, China b SI-TECH Information Technology Co.,Ltd, Beijing, China a r t i c l e i n f o Article history: Keywords: Secure string pattern query Open data initiative Secure data sharing a b s t r a c t The open data initiative allows the organizations from different countries to share and integrate their data to produce value added information products and services. However, the trust relationship between any two organizations can hardly be established due to the data sensitivity and their specific responsibilities. In this paper, we propose a secure string pattern query processing framework for multiple data owners. With this framework, we design an efficient and secure indexing structure called S ecure S tring PA ttern S earch tree (S2PAStree). S2PAStree allows the execution of the secure string pattern query with logarithmic time complexity. We further propose a set of secure index construction protocols under the scenario that multiple data owners share and integrate their sensitive data. Finally, we conduct comprehensive experiments on three public datasets to evaluate the efficiency and the scalability of our solutions. © 2019 Elsevier Ltd. All rights reserved. 1. Introduction With the increasing popularity of cloud computing, outsourcing data and computation to cloud servers provides a cost-effective way to support large-scale data storage and query processing. For example, yelp, 1 acting as a data owner, outsources its data and services to the cloud servers provided by Amazon 2 for saving cost. Indeed, many companies, organizations, and individual users have adopted the cloud platform to facilitate their business operations, research, or everyday needs [1]. Despite the tremendous business and technical advantages, sensitive data need to be protected from the cloud servers and unauthorized users due to the security concerns. Examples may include financial and medical records, government collected property ownership and crime records, and user profiles in social networks. A common approach is to encrypt the data and queries (e.g., [2–5]). That is, the data owner outsources his encrypted data to the cloud server. The server processes encrypted queries from the authorized client over the encrypted data and returns the query results to the client. During the query processing, the cloud server should not gain any knowledge about the original data, the client query, or the query results. Researchers have studied this problem from many different angles, such as Private Information Retrieval (PIR) [6,7], Oblivious RAM [8–10], Encrypted Keyword Search Corresponding author. E-mail addresses: cla@uestc.edu.cn (A. Hassan), guanyf@si-tech.com.cn (Y. Guan). 1 http://www.yelp.com. 2 http://www.aws.amazon.com. [11–14], Deterministic and Order-preserving encryption [15–17], Fully Homomorphic Encryption [18,19], and Searchable Symmetric Encryption (SSE) [20]. Meanwhile, many techniques have been pro- posed to support specific secure queries, such as the secure skyline query [5], the secure top-k query [21], the secure continuous top-k query [22,23], and the secure k-nearest neighbor (kNN) query [2]. These works assume that the cloud servers follow the semi- honest model [24] and the data owner is trustworthy. However, when considering secure data sharing among the data owners, this assumption becomes invalid. This is known as the open data ini- tiative (e.g., Smart Government [25], Human Genome Project [26]), which allows the data owners to share and integrate their data to produce value added information products and services. The trust relationship between any two data owners can hardly be estab- lished due to the data sensitivity and their specific responsibilities. In real-life applications, there is a demand for searching key- words containing arbitrary substrings (e.g., user profile matching [27], DNA fragment searching [28]). This is known as string pat- tern query. A string pattern query is a sequence of characters. A keyword is said to match a string pattern, if the query string is either identical to the keyword or is contained as a sub-string of the keyword. In this paper, with the assumption that the multiple data owners follow the semi-honest adversary model, we focus on the problem of secure string pattern query for the sensitive data sharing among them. Motivation example. As a typical example, a police agent wants to track down a fugitive’s information with only a fragment of ve- hicle’s license plate (e.g., the first three characters of the plate, ”6EH”). To successfully profile the vehicle, the agent needs to https://doi.org/10.1016/j.jisa.2019.06.001 2214-2126/© 2019 Elsevier Ltd. All rights reserved.