Percival: A Searchable Secret-Split Datastore Joel C. Frank 1 , Shayna M. Frank 1 , Lincoln A. Thurlow 1 Thomas M. Kroeger 2 , Ethan L. Miller 1 , Darrell D. E. Long 1 1 Storage Systems Research Center, University of California, Santa Cruz, CA 2 Sandia National Laboratories, Livermore, CA Abstract—Maintaining information privacy is challenging when sharing data across a distributed long-term datastore. In such applications, secret splitting the data across independent sites has been shown to be a superior alternative to ﬁxed-key encryption; it improves reliability, reduces the risk of insider threat, and removes the issues surrounding key management. However, the inherent security of such a datastore normally precludes it from being directly searched without reassembling the data; this, however, is neither computationally feasible nor without risk since reassembly introduces a single point of compromise. As a result, the secret-split data must be pre-indexed in some way in order to facilitate searching. Previously, ﬁxed-key encryption has also been used to securely pre-index the data, but in addition to key management issues, it is not well suited for long term applications. To meet these needs, we have developed Percival: a novel system that enables searching a secret-split datastore while maintaining information privacy. We leverage salted hashing, performed within hardware security modules, to access pre- recorded queries that have been secret split and stored in a distributed environment; this keeps the bulk of the work on each client, and the data custodians blinded to both the contents of a query as well as its results. Furthermore, Percival does not rely on the datastore’s exact implementation. The result is a ﬂexible design that can be applied to both new and existing secret-split datastores. When testing Percival on a corpus of approximately one million ﬁles, it was found that the average search operation completed in less than one second. I. I NTRODUCTION Security is often a critical issue for long-term storage, particularly given recent incidents involving insiders releasing large amounts of private or classiﬁed information [1]. Much of this risk is due to traditional storage systems having a single point of compromise: the data server. If that one point is compromised at any time during the datastore’s lifespan, information can be leaked. This threat is obviously magniﬁed in a distributed environment since, by its nature, the data is stored in multiple locations. In situations where a single location, or site, is not trusted, but the collection of sites as a whole is trusted, secret splitting mitigates this problem. By ﬁrst dividing a data object into shares, and then distributing each share to an independent site in the distributed environment, no single site has enough information to perform reconstruction because a single share reveals nothing about the original data. However, due to the inherent information-theoretic security of a secret-split datastore, searching it is normally not possible without reconstructing the original data from its constituent 978-1-4673-7619-8/15/$31.00 c 2015 IEEE shares. Reconstruction, however, is not only computationally infeasible, it reintroduces the single point of compromise. As a result, the shares need to be pre-indexed in some way that facilitates searching them. Previously, this has been accomplished using ﬁxed-key encryption, e.g. public-key, to minimize, and ideally prevent, information leakage. However, in addition to key management issues, which are undesirable in long-term storage environments, ﬁxed-key encryption typi- cally suffers from a catastrophic release of information upon compromise. To address the need to maintain information privacy while searching a secret-split datastore, we developed Percival:a novel system that accomplishes these tasks without relying on ﬁxed-key encryption. Furthermore, Percival is completely agnostic with regards to the datastore’s implementation since whether the datastore is based on POTSHARDS [2] or Clever- safe [3], the user is left with some kind of identiﬁer that can be used to retrieve the user’s data. Percival combines this collection of identiﬁers with each data object’s search term(s) in order to produce a set of reverse indexes; each reverse index is in essence a search result since it maps a search term to the set of data objects that should be found using that search term. For our purposes, we deﬁne a search term as a single word that has been identiﬁed to relate to a particular data object. For example, to ﬁnd the data object Moby Dick, one might use the search term ‘whale’ in order to retrieve the object from the datastore. Once the set of reverse indexes is generated, each individual index is secret split; these resulting shares are each sent to a different query server in the distributed environment. We deﬁne a query server as a hardware security module backed by one or more machines working together as a single, logical key-value store. A hardware security module [4], HSM, is a commercially available, physical device that protects and manages the secure pieces of this design by providing a place to handle sensitive data in a relatively non-secure location. It provides both tamper evidence and resistance by logging intrusion attempts as well as clearing its internal memory if it detects an intrusion attempt. Percival relies on the HSMs to process all secure messages from the client, while not exposing any information to the rest of the query server. In general, query servers and their interaction to clients are discussed in detail in Section IV, but for now they can be viewed as a secure key-value store whose job it is to service search request from authorized clients, and respond with the share of the correct reverse index. Once a client has retrieved the shares