Information Leakage Analysis of Database Query Languages Raju Halder Indian Institute of Technology Patna, India halder@iitp.ac.in Matteo Zanioli Università Ca’ Foscari Venezia, Italy zanioli@unive.it Agostino Cortesi Università Ca’ Foscari Venezia, Italy cortesi@unive.it ABSTRACT In this work, we extend language-based information-ﬂow security analysis to the case of database applications embed- ding query languages. The analysis is performed by (i) com- puting an overapproximation of variables’ dependences, in the form of propositional formula, occurred up to each pro- gram point, (ii) checking the satisﬁability on assigning truth values to variables, (iii) analyzing the application over a nu- merical abstract domain, and ﬁnally, (iv) enhancing the anal- ysis using the reduced product of the propositional formulae domain and the numerical abstract domain. Categories and Subject Descriptors F.3.2 [Semantics of Programming Languages]: Program Anal- ysis; H.2.0 [General]: Security, integrity, andprotection; H.2.3 [Languages]: Data manipulation languages (DML), Query languages General Terms Static Analysis, Abstract Interpretation, Databases, Security Keywords Information Flow Analysis, Query Languages 1. INTRODUCTION Various language-based information ﬂow security mod- els have been proposed, aiming at preventing unauthorized leakage of sensitive data, directly or indirectly, while prop- agating through an application [7, 8, 12, 14, 17]. Works in this direction have been starting with the pioneering work of Dennings in the 1970s [3]. To ensure end-to-end security, the notion of non-interference was introduced [14]: Given a program P and set of states Σ, the non-interference policy states that ∀σ 1 ,σ 2 ∈ Σ.σ 1 ≡ L σ 2 = ⇒ [[P]]σ 1 ≡ L [[P]]σ 2 , where [[.]] is semantic function and ≡ L represents low-equivalence relation between states; That Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. SAC’14 March 24-28, 2014, Gyeongju, Korea. Copyright 2014 ACM 978-1-4503-2469-4/14/03 ...$10.00. is, a variation of conﬁdential data does not cause any varia- tion to public data. Existing static-analysis models in the literature for the ver- iﬁcation of such a property can be classiﬁed as type system- based [14, 16], dependence graph-based [6, 7, 9], slicing-based [1, 8], etc. Observably, all these notable works refer only to imperative, object-oriented, functional programming lan- guages [7, 8, 12, 13, 14], while in the information system sce- narios most of the data-intensive applications are embedded with SQL commands extracting or manipulating data from back-end databases. Various access control mechanisms are although proved to be very eﬃcient at database level, but in practice conﬁdentiality of sensitive database information can possibly be compromised while propagating through the applications accessing and processing them legitimately. No attention has been given in this direction to address such kind of leakage of database information through data-intensive applications. In [17, 18], authors used logical formulae to represent vari- ables’ dependences in the form  0≤i≤n, 0≤j≤m { y i → z j } which means that the values of variable z j possibly depend on the values of variable y i . The information leakage analysis on this domain of propositional formulae involves the following steps: • Construction of propositional formula ψ representing an over-approximation of variables’ dependences at each program point. • Assignment of truth values to each variable considering its sensitivity by a truth-assignment function ξ. If ξ does not satisfy ψ, then there could be some information leakage. • Analysis of program over a numerical abstract domain using the reduced product of the propositional formulae domain and the numerical abstract domain to make the analysis more accurate by removing possible false pos- itives. In this paper, we aim to extend the full power of the proposed model in [17, 18] to the case of data-intensive applications em- bedding SQL statements, in order to identify possible leakage of sensitive database information as well. In particular, • We deﬁne an abstract semantics of programs embed- ding SQL statements over the domain of propositional