Journal of Information Security and Applications 47 (2019) 335–352
Contents lists available at ScienceDirect
Journal of Information Security and Applications
journal homepage: www.elsevier.com/locate/jisa
Secure string pattern query for open data initiative
Yong Wang
a
, Abdelrhman Hassan
a,∗
, Fei Liu
a
, Yuanfeng Guan
b
, Zhiwei Zhang
b
a
School of Computer Science and Engineering, Center for Cyber Security, University of Electronic Science and Technology of China, Chengdu, China
b
SI-TECH Information Technology Co.,Ltd, Beijing, China
a r t i c l e i n f o
Article history:
Keywords:
Secure string pattern query
Open data initiative
Secure data sharing
a b s t r a c t
The open data initiative allows the organizations from different countries to share and integrate their data
to produce value added information products and services. However, the trust relationship between any
two organizations can hardly be established due to the data sensitivity and their specific responsibilities.
In this paper, we propose a secure string pattern query processing framework for multiple data owners.
With this framework, we design an efficient and secure indexing structure called S ecure S tring PA ttern
S earch tree (S2PAStree). S2PAStree allows the execution of the secure string pattern query with logarithmic
time complexity. We further propose a set of secure index construction protocols under the scenario
that multiple data owners share and integrate their sensitive data. Finally, we conduct comprehensive
experiments on three public datasets to evaluate the efficiency and the scalability of our solutions.
© 2019 Elsevier Ltd. All rights reserved.
1. Introduction
With the increasing popularity of cloud computing, outsourcing
data and computation to cloud servers provides a cost-effective
way to support large-scale data storage and query processing. For
example, yelp,
1
acting as a data owner, outsources its data and
services to the cloud servers provided by Amazon
2
for saving cost.
Indeed, many companies, organizations, and individual users have
adopted the cloud platform to facilitate their business operations,
research, or everyday needs [1]. Despite the tremendous business
and technical advantages, sensitive data need to be protected
from the cloud servers and unauthorized users due to the security
concerns. Examples may include financial and medical records,
government collected property ownership and crime records, and
user profiles in social networks.
A common approach is to encrypt the data and queries (e.g.,
[2–5]). That is, the data owner outsources his encrypted data to
the cloud server. The server processes encrypted queries from the
authorized client over the encrypted data and returns the query
results to the client. During the query processing, the cloud server
should not gain any knowledge about the original data, the client
query, or the query results. Researchers have studied this problem
from many different angles, such as Private Information Retrieval
(PIR) [6,7], Oblivious RAM [8–10], Encrypted Keyword Search
∗
Corresponding author.
E-mail addresses: cla@uestc.edu.cn (A. Hassan), guanyf@si-tech.com.cn (Y. Guan).
1
http://www.yelp.com.
2
http://www.aws.amazon.com.
[11–14], Deterministic and Order-preserving encryption [15–17],
Fully Homomorphic Encryption [18,19], and Searchable Symmetric
Encryption (SSE) [20]. Meanwhile, many techniques have been pro-
posed to support specific secure queries, such as the secure skyline
query [5], the secure top-k query [21], the secure continuous top-k
query [22,23], and the secure k-nearest neighbor (kNN) query [2].
These works assume that the cloud servers follow the semi-
honest model [24] and the data owner is trustworthy. However,
when considering secure data sharing among the data owners, this
assumption becomes invalid. This is known as the open data ini-
tiative (e.g., Smart Government [25], Human Genome Project [26]),
which allows the data owners to share and integrate their data to
produce value added information products and services. The trust
relationship between any two data owners can hardly be estab-
lished due to the data sensitivity and their specific responsibilities.
In real-life applications, there is a demand for searching key-
words containing arbitrary substrings (e.g., user profile matching
[27], DNA fragment searching [28]). This is known as string pat-
tern query. A string pattern query is a sequence of characters. A
keyword is said to match a string pattern, if the query string is
either identical to the keyword or is contained as a sub-string of
the keyword. In this paper, with the assumption that the multiple
data owners follow the semi-honest adversary model, we focus on
the problem of secure string pattern query for the sensitive data
sharing among them.
Motivation example. As a typical example, a police agent wants
to track down a fugitive’s information with only a fragment of ve-
hicle’s license plate (e.g., the first three characters of the plate,
”6EH”). To successfully profile the vehicle, the agent needs to
https://doi.org/10.1016/j.jisa.2019.06.001
2214-2126/© 2019 Elsevier Ltd. All rights reserved.