IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1
Skyline Processing on Distributed Vertical
Decompositions
George Trimponias, Ilaria Bartolini, Dimitris Papadias, Yin Yang
Abstract—We assume a dataset that is vertically decomposed among several servers, and a client that wishes to compute the
skyline by obtaining the minimum number of points. Existing solutions for this problem are restricted to the case where each
server maintains exactly one dimension. This paper proposes a general solution for vertical decompositions of arbitrary
dimensionality. We first investigate some interesting problem characteristics regarding the pruning power of points. Then, we
introduce VPS (v ertical p artition s kyline), an algorithmic framework that includes two steps. Phase 1 searches for an anchor
point Panc that dominates, and hence eliminates, a large number of records. Starting with Panc, Phase 2 constructs incrementally
a pruning area using an interesting union-intersection property of dominance regions. Servers do not transmit points that fall
within the pruning area in their local subspace. Our experiments confirm the effectiveness of the proposed methods under
various settings.
Keywords—Distributed Skyline, Vertical Partitioning, Query Processing.
————————————————————
1 INTRODUCTION
iven a data set DS of d-dimensional records/points, a
record P ∈ DS dominates another Q ∈ DS, if P is no
worse than Q on all d attributes/dimensions, and it is
better than Q on at least one dimension. The skyline SKY ⊆
DS consists of all points that are not dominated. In this
paper, we assume that the dataset is vertically distributed
among m servers, so that a server Ni stores a subset Di of
the dimensions and the ID of each record. For every two
servers Ni and Nj (1≤i≠j≤m), Di∩Dj = ∅, i.e., the servers do
not have common attributes except for the record ID. As a
real-world example, consider that a mobile client wishes
to compute the skyline over a restaurant dataset based on
the following criteria: quality, value, proximity to
cinemas, and distance from the current location. The
former two attributes are provided by a restaurant rating
service, whereas the rest are obtained from an on-line
map server. Similarly in e-commerce applications,
product prices may be provided by sites that find the
lowest price (e.g., pricegrabber.com), while technical
characteristics reside in specialized libraries (e.g., cnet).
Fig. 1 shows an instance with two servers N1, N2, and 8
points A-H. N1 (resp. N2) maintains the subspace D1 = {d1,
d2} (resp. D2 = {d3, d4}). Without loss of generality,
throughout our presentation we consider that smaller
values are preferred on all dimensions. The local skyline
SKY1 at N1 contains a single point B, which dominates all
other records in D1 (Fig. 1a). Similarly, the local skyline
SKY2 at N2 consists of B and E (Fig. 1b). The global skyline
SKY over all dimensions comprises all points that appear
in SKY1 or SKY2 (i.e., B, E), as well as additional records
that are not dominated by a single point on all
dimensions, i.e., SKY = {B, E, A, C}. For instance, A ∈ SKY
since it is dominated by different records (e.g., B and E) in
the two subspaces. On the other hand, F, G, H and D are
not in SKY because they are dominated by a single point
(A, C, C, B, respectively) on all dimensions.
d
2
d
1
A
F
B
C
D
E
G
H
local
skyline
d
3
d
4
A
F
B
C
D
E
G
H
local
skyline
(a) SubspaceD1 at server N1. (b) Subspace D2 at Server N2.
Fig. 1. Running example.
In our setting, we assume that there is no central server
to materialize SKY. Moreover, the skyline may change
when updates occur to one or more servers (e.g., some
restaurant ratings are altered), and it may depend on the
particular user (e.g., the distance between the restaurant
and the client’s location). Hence, SKY must be computed
on-demand. The skyline algorithm should minimize the
points retrieved from each server because the
communication overhead constitutes the dominant factor
in battery consumption for mobile clients [7][17].
Moreover, more data increase the amount of
computations required to process them.
xxxx-xxxx/0x/$xx.00 © 200x IEEE
————————————————
• G. Trimponias and D. Papadias are with the Dept. of Computer Science
and Engineering, Hong Kong University of Science and Technology.
E-mail: {trimponias, dimitris}@cse.ust.hk
• I. Bartolini is with the Department of Electronics, Computer Science and
Systems, University of Bologna.
E-mail: i.bartolini@unibo.it
• Y. Yang is with the Advanced Digital Sciences Center, Singapore.
E-mail: yin.yang@adsc.com.sg
Manuscript received: June 2011, revised October 2011.
G