Journal of Computer Security 12 (2004) 655–692 655 IOS Press Cardinality-based inference control in data cubes Lingyu Wang ∗∗ , Duminda Wijesekera and Sushil Jajodia Center for Secure Information Systems, George Mason University MSN4A4, 4400 University Drive, Fairfax, VA 22030-4444, USA E-mail: {lwang3,dwijesek,jajodia}@gmu.edu This paper addresses the inference problem in on-line analytical processing (OLAP) systems. The in- ference problem occurs when the exact values of sensitive attributes can be determined through answers to OLAP queries. Most existing inference control methods are computationally expensive for OLAP sys- tems, because they ignore the special structures of OLAP queries. By exploiting such structures, we derive cardinality-based sufficient conditions for safe OLAP data cubes. Specifically, data cubes are safe from inferences if their core cuboids are dense enough, in the sense that the number of known values is under a tight bound. We then apply the sufficient conditions on the basis of a three-tier inference control model. The model introduces an aggregation tier between data and queries. The aggregation tier represents a col- lection of safe data cubes that are pre-computed over a partition of the data using the proposed sufficient conditions. The aggregation tier is then used to provide users with inference-free queries. Our approach mitigates the performance penalty of inference control, because partitioning the data yields smaller in- put to inference control algorithms, pre-computing the aggregation tier reduces on-line delay, and using cardinality-based conditions guarantees linear-time complexity. 1. Introduction Decision support systems such as On-line Analytical Processing (OLAP) are be- coming increasingly important in industry. OLAP systems assist users in exploring trends and patterns in large amount of data by providing them with interactive results of statistical aggregations. Contrary to this initial objective, inappropriate disclosure of sensitive data stored in the underlying data warehouses results in the breach of individual’s privacy and jeopardizes the organization’s interest. It is well known that access control alone is insufficient in controlling information disclosure, because in- formation not released directly may be inferred indirectly by manipulating legitimate queries about aggregated information, which is known as the inference control prob- lem in databases. OLAP systems are especially vulnerable to such unwanted infer- ences, because of the aggregations used in OLAP queries. Providing inference-free answers to data cube style OLAP queries while not adversely impacting the response time of an OLAP system is the subject of this paper. The inference problem has been investigated since 70’s with many inference con- trol methods proposed, especially for statistical databases. However, most of those * This work was partially supported by the National Science Foundation under grant CCR-0113515. ** Corresponding author. Tel.: +1 703 993-1629; Fax: +1 703 993-1638; E-mail: lwang3@gmu.edu. 0926-227X/04/$17.00 2004 – IOS Press and the authors. All rights reserved