IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Skyline Processing on Distributed Vertical Decompositions George Trimponias, Ilaria Bartolini, Dimitris Papadias, Yin Yang Abstract—We assume a dataset that is vertically decomposed among several servers, and a client that wishes to compute the skyline by obtaining the minimum number of points. Existing solutions for this problem are restricted to the case where each server maintains exactly one dimension. This paper proposes a general solution for vertical decompositions of arbitrary dimensionality. We first investigate some interesting problem characteristics regarding the pruning power of points. Then, we introduce VPS (v ertical p artition s kyline), an algorithmic framework that includes two steps. Phase 1 searches for an anchor point Panc that dominates, and hence eliminates, a large number of records. Starting with Panc, Phase 2 constructs incrementally a pruning area using an interesting union-intersection property of dominance regions. Servers do not transmit points that fall within the pruning area in their local subspace. Our experiments confirm the effectiveness of the proposed methods under various settings. Keywords—Distributed Skyline, Vertical Partitioning, Query Processing. ———————————————————— 1 INTRODUCTION iven a data set DS of d-dimensional records/points, a record P DS dominates another Q DS, if P is no worse than Q on all d attributes/dimensions, and it is better than Q on at least one dimension. The skyline SKY DS consists of all points that are not dominated. In this paper, we assume that the dataset is vertically distributed among m servers, so that a server Ni stores a subset Di of the dimensions and the ID of each record. For every two servers Ni and Nj (1≤i≠jm), DiDj = , i.e., the servers do not have common attributes except for the record ID. As a real-world example, consider that a mobile client wishes to compute the skyline over a restaurant dataset based on the following criteria: quality, value, proximity to cinemas, and distance from the current location. The former two attributes are provided by a restaurant rating service, whereas the rest are obtained from an on-line map server. Similarly in e-commerce applications, product prices may be provided by sites that find the lowest price (e.g., pricegrabber.com), while technical characteristics reside in specialized libraries (e.g., cnet). Fig. 1 shows an instance with two servers N1, N2, and 8 points A-H. N1 (resp. N2) maintains the subspace D1 = {d1, d2} (resp. D2 = {d3, d4}). Without loss of generality, throughout our presentation we consider that smaller values are preferred on all dimensions. The local skyline SKY1 at N1 contains a single point B, which dominates all other records in D1 (Fig. 1a). Similarly, the local skyline SKY2 at N2 consists of B and E (Fig. 1b). The global skyline SKY over all dimensions comprises all points that appear in SKY1 or SKY2 (i.e., B, E), as well as additional records that are not dominated by a single point on all dimensions, i.e., SKY = {B, E, A, C}. For instance, A SKY since it is dominated by different records (e.g., B and E) in the two subspaces. On the other hand, F, G, H and D are not in SKY because they are dominated by a single point (A, C, C, B, respectively) on all dimensions. d 2 d 1 A F B C D E G H local skyline d 3 d 4 A F B C D E G H local skyline (a) SubspaceD1 at server N1. (b) Subspace D2 at Server N2. Fig. 1. Running example. In our setting, we assume that there is no central server to materialize SKY. Moreover, the skyline may change when updates occur to one or more servers (e.g., some restaurant ratings are altered), and it may depend on the particular user (e.g., the distance between the restaurant and the client’s location). Hence, SKY must be computed on-demand. The skyline algorithm should minimize the points retrieved from each server because the communication overhead constitutes the dominant factor in battery consumption for mobile clients [7][17]. Moreover, more data increase the amount of computations required to process them. xxxx-xxxx/0x/$xx.00 © 200x IEEE ———————————————— G. Trimponias and D. Papadias are with the Dept. of Computer Science and Engineering, Hong Kong University of Science and Technology. E-mail: {trimponias, dimitris}@cse.ust.hk I. Bartolini is with the Department of Electronics, Computer Science and Systems, University of Bologna. E-mail: i.bartolini@unibo.it Y. Yang is with the Advanced Digital Sciences Center, Singapore. E-mail: yin.yang@adsc.com.sg Manuscript received: June 2011, revised October 2011. G