Decentralized Query Planning in Coalition Networks Mingyi Zhao * , Qiang Zeng * , Jorge Lobo , Peng Liu * , Fan Ye , Seraphin Calo and Tom Berman * Pennsylvania State University IBM Watson Research Center IBM Hursley Abstract—Previous distributed or federated database systems widely assume a central trusted server for query planning and optimization. However, in many coalition scenarios such central server is hard to establish or maintain. In this paper, we propose a decentralized query planning service for data sharing in coalition networks. The policy confidentiality issue during collaboration is also discussed. I. I NTRODUCTION Information sharing across multiple independent parties is increasingly important for military, scientific research and industrial collaboration. Particularly, the access control should allow each party to autonomously disclose some of its data to certain party in the granularity of tuples. To satisfy this require- ment, our previous work [1] adopted pairwise authorizations and proposed a new approach of safe query processing for such policy specifications in coalition networks. However, a common trusted server for query planning and optimization is widely assumed in this design and other distributed database systems as well [2], [3]. This centralized query planning has several drawbacks. First, the trust of the central server is hard to establish. Since each party in the coalition network is independent, it is usually impossible to assign a reliable party to run this server. In addition, establishing such trust could require a lot of effort, which is impractical for ad hoc emergency collaborations or fast changing military scenarios. Second, the central server becomes a bottleneck and a single point of failure of the entire system. An attack or a server fail- ure paralyzes the whole coalition network. Moreover, since all parties’ authorizations and database meta information is stored there, the compromise of this server leaks the confidential information of every party. In summary, just like decentralizing the query processing, we need to design a new protocol and system that support decentralized query planning. Figure 1 is the four possible ways to process a query in coalition networks. Our problem, located in the bottom right, is the most challenging one. Fig. 1. Four possible ways to process a query in coalition networks. Motivated by the gateway idea in networking, we propose a gateway-based solution for this problem. The gateway protects the information confidentiality of its party and also negotiates with other parties’ gateways during query planning. In this decentralized scheme, the malfunction or compromise of one gateway only affects its party, not the whole coalition network. However, such decentralized service creates new challenges for query planning. In this paper, we first introduce the gateway of decentralized query planning and then discuss the policy confidentiality issue that arises during information sharing. II. GATEWAYS AND PROBLEM STATEMENT A coalition network consists of multiple database servers that belong to different parties. We assume that servers owned by the same party fully trust each other while servers owned by other parties are semi-trusted, i.e., they are curious about other parties’ sensitive information but will not maliciously attack. Each party in the coalition network owns a gateway server. The gateway knows the table distribution in intra-party servers as well as the authorizations of its party to other parties. Each party’s gateway also has persistent connections to all hosts in its party and other parties’ gateways. Fig. 2. An Example of Coalition Network with Gateways Figure 2 is an example of a coalition network with three parties X, Y and Z . Lines represent the persistent connections between servers. There are six pairwise authorization policies: X Y,X Z, Y X, Y Z, Z X, Z Y . An authorization policy is essentially a view. The authorization policy X Y specifies tuples in X’s tables that Y is allowed to see. For the sensitivity and confidentiality of these policies, a party is restricted to know policies that he himself created. For example, party X only knows X Y and X Z . Because the query plan generation for selection and pro- jection is straightforward, we will only discuss joins that cross multiple databases and belong to different parties. Let’s