On Information Leakage by Indexes over Data Fragments Sabrina De Capitani di Vimercati 1 , Sara Foresti 1 , Sushil Jajodia 2 , Stefano Paraboschi 3 , Pierangela Samarati 1 1 DI - Universit` a degli Studi di Milano, 26013 Crema - Italy firstname.lastname@unimi.it 2 CSIS - George Mason University, Fairfax, VA 22030-4444 - USA jajodia@gmu.edu 3 Universit` a degli Studi di Bergamo, 24044 Dalmine - Italy parabosc@unibg.it Abstract—Data fragmentation has recently emerged as a com- plementary approach to encryption for protecting confidentiality of sensitive associations when storing data at external parties. In this paper, we discuss how the use of indexes, typically associated with the encrypted portion of the data, while desirable for providing effectiveness and efficiency in query execution, can - combined with fragmentation - cause potential leakage of confidential (encrypted or fragmented) information. We illustrate how the exposure to leakage varies depending on the kind of indexes. Such observations can result useful for the design of approaches assessing information exposure and for the definition of safe (free from inferences) indexes in fragmented data. I. I NTRODUCTION Cloud computing represents today a successful paradigm for delegating data storage and management to external services. To protect data confidentiality from the storing/processing servers themselves, which can be honest but curious, data are typically encrypted – so to make them non-intelligible to the storing servers – and queries are executed on associated indexes [1], [2]. Recent approaches have proposed the use of data fragmentation, possibly combined with encryption, to protect confidentiality when what is sensitive is the data association, in contrast to the specific data values [3], [4], [5]. Fragmentation allows the storage of plaintext values at the server side thus providing more convenience in terms of data accessibility and query performance, since the server can evaluate selection conditions on plaintext attributes in a precise way. However, when operating on a fragment, the server is not able to evaluate conditions on attributes not appearing in the clear. These conditions would then need to be evaluated by the client, who can access the whole dataset, with potentially high cost in terms of communication and client- side computation. To avoid such a limitation and effectively exploit fragmentation, it is important to couple fragments with indexes on the encrypted portion of the data enabling some server-side evaluation of conditions on attributes not appearing in the clear [6]. In this paper, we discuss the combined use of fragmentation and indexes and show how it may cause leakage of confidential information, otherwise protected by encryption or fragmentation. The contribution of our paper is on pointing out such information leakage issues, together with observations on strengths and weaknesses of different indexing strategies. Our paper represents a first step for the definition of safe indexes (i.e., free from inferences) for data fragments, and of metrics assessing the risk of information leakage to which indexed fragments are exposed. The analysis we present makes the assumption that the adversary is not monitoring the queries and only has access to the static representation of the data. This assumption is consistent with a scenario where users are worried by adversaries accessing the data stored on the server (e.g., due to authentication errors or the use of side channels that access the data at the physical level), rather than by adversaries with long-term control over the server processing the queries. The remainder of this paper is organized as follows. Sec- tion II presents the basic concepts on fragmentation and indexing techniques. Section III describes inference exposure caused by the introduction of indexes in fragments. Section IV discusses the properties an index should have to limit expo- sure to inference. Section V illustrates related work. Finally, Section VI presents our conclusions. II. FRAGMENTS AND I NDEXES Consistently with existing proposals, we consider the out- sourced data to be represented as a single relational table and confidentiality constraints to express attributes, or com- binations thereof, that are considered sensitive. Fragmentation avoids encrypting attributes when their values are not sensi- tive, allowing instead the protection of sensitive associations separating the involved attributes by vertically splitting the relation into different fragments that cannot be joined [3], [4], [5]. To illustrate, consider the relation and the confidentiality constraints in Figure 1(a) and in Figure 1(b), respectively. While SSN values are sensitive and need to be encrypted for external storage, for all the other attributes what is sensitive is their association. Fragmentation F ={{Name,State}, {Job}, {Disease}} protects these sensitive associations allowing the plaintext representation of the four attributes in three different fragments. At the physical level [4], fragments are composed of tuples, each reporting a salt (used in the en- cryption of the tuple), the encrypted tuple (including all the