Scalable Parallelization of Skyline Computation for Multi-core Processors Sean Chester ∗ , Darius ˇ Sidlauskas † , Ira Assent ∗ , and Kenneth S. Bøgh ∗ ∗ Data-Intensive Systems Group, Aarhus University, Denmark † Data-Intensive Applications and Systems Laboratory, EPFL, Switzerland schester@cs.au.dk darius.sidlauskas@epfl.ch ira@cs.au.dk ksb@cs.au.dk Abstract—The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multicore CPU algorithms have been proposed to accelerate the computation of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multicore skyline algo- rithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads, which is used to min- imize dominance tests while maintaining high throughput. The algorithm uses an efficiently-updatable data structure over the shared, global skyline, based on point-based partitioning. Also, we release a large benchmark of optimized skyline algorithms, with which we demonstrate on challenging workloads a 100-fold speedup over state-of-the-art multicore algorithms and a 10-fold speedup with 16 cores over state-of-the-art sequential algorithms. I. I NTRODUCTION Skyline computation, introduced in 2001 [4], is still an active research area with applications in route planning for road networks [14], [21], data exploration [5], web service com- position [1], and many other multi-criteria decision-making domains wherein (possibly conflicting) preferences need to be balanced. Figure 1a illustrates the skyline over an example dataset. If small values are preferred (e.g., the points represent fuel consumption and expected travel time), then q is clearly a worse option than (i.e., is dominated by) p, since it has larger values for both coordinates. The skyline consists of all non- dominated points (in this case, p, r, s, t). However, the skyline is expensive to compute, especially when it is large relative to the input, because each skyline point (at least implicitly) needs to be compared to every other skyline point. This computational challenge has prompted the use of modern computing platforms, such as GPUs [3], [8] and multicore CPUs [13], [16], as well as distributed environments [12], including MapReduce [17], [19], [22], to accelerate the computation. Of these, multicore CPUs are a particularly attractive option, because the cost of shared data structures is much lower and parallel work need not be isolated. Still, we demonstrate the surprising conclusion that current multicore skyline algorithms can be outperformed by at least an order of magnitude by sequential algorithms on modest workloads. Current multicore algorithms adopt the same paradigm as distributed algorithms, a divide-and-conquer approach wherein † Work done while in the MADALGO group at Aarhus University. (a) Skyline example (b) Partitioning Fig. 1: (a): p, r, s, t are in the skyline, but not q: it has higher x- and y-values than p. (b): partitioning reveals incomparability and refines the probability of a point dominating another. the dataset is cut, local skylines are computed in isolation by each thread, and then local results are merged to produce a global result. This paradigm suffers two principal drawbacks. First, if the local results are large, then the merging step becomes prohibitively expensive. Moreover, this partitioning hinders pruning capacity. If p and q in Figure 1a are allocated to separate threads, then the dominance cannot be detected un- til the more expensive merge phase, once all threads complete. By contrast, we adopt a different paradigm, where all known skyline points are maintained in a global data structure. The skyline is updated at regular synchronization points and read by all threads. We order the skyline points in the data structure to maximize the probability of detecting new domi- nance relationships quickly so that subsequent dominance tests can be averted. The processing of points is done in ordered blocks that guarantee each point is compared to at most α more points than in a sequential algorithm, a guarantee that a divide-and-conquer approach cannot offer. As a consequence, we have no expensive merge phase, dominance relationships are determined early rather than being severed by file cutting, and, like state-of-the-art sequential algorithms, we can report results progressively. In all, we not only outperform state-of-the-art multicore algorithms by up to two orders of magnitude, but also outperform sequential algorithms on account of good multi-threaded scalability. A. Contributions and Outline We study parallel skyline computation for multicore archi- tectures with a focus on parallel scalability and raw perfor- mance. In particular, after formally introducing the problem (Section II) and before concluding (Section VIII), we: • discuss the overlap in key principles for skyline and multicore computation, elaborating on the challenges in