Parallel Computation of Skyline Queries Louis Woods Gustavo Alonso Systems Group, Dept. of Computer Science ETH Zurich, Switzerland {firstname.lastname}@inf.ethz.ch Jens Teubner DBIS Group, Dept. of Computer Science TU Dortmund University, Germany jens.teubner@cs.tu-dortmund.de Abstract—Due to stagnant clock speeds and high power consumption of commodity microprocessors, database vendors have started to explore massively parallel co-processors such as FPGAs to further increase performance. A typical approach is to push simple but compute-intensive operations (e.g., pre- filtering, (de)compression) to FPGAs for acceleration. In this paper, we show how a significantly more complex operation— the computation of the skyline—can be holistically implemented on an FPGA. A skyline query computes the pareto optimal set of multi-dimensional data points. These queries have been studied in software extensively over the last decade but this paper is the first to examine skyline computation in hardware. We propose a methodology that interleaves data storage and computation, allowing multiple operations to be executed on the same working set in parallel, while accounting for all data dependencies. Our experiments show that we achieve very promising results compared to CPU-based solutions. Keywords-FPGA, database, pareto optimal, skyline query I. I NTRODUCTION Recently, a number of projects have suggested to exploit FPGAs for database processing, e.g. [1], [2], [3]. On the commercial side, so-called appliances such as [4], [5] suc- cessfully use FPGAs to both improve performance and save energy. However, while FPGAs provide high aggregated compute power, it is often difficult to turn their inherent parallelism into true performance for a given database task. Thus, the state-of-the-art is to push only relatively simple operations (e.g., projection/selection-based filtering, (de)compression) to configurable hardware and let commod- ity CPUs take care of the remaining processing, which is the case, e.g., in IBM/Netezza’s data warehouse appliance [4]. Unfortunately, this approach tends to leave much of the true hardware potential unused. A skyline query [6] is a good example of a complex database task that could greatly benefit from hardware acceleration due to its compute-intensive nature. Yet, as we will see, it is not at all obvious how to implement skyline computation on an FPGA in an efficient way. Skyline queries reduce large multi-dimensional data sets to smaller sets of interest by eliminating items that are dominated by others, i.e., by computing the set of pareto optimal items. Skyline queries are relevant in several areas, e.g., search pruning, decision making, and personalized services. Furthermore, they are related to several other well-known problems such as convex hull, top-K queries, and nearest-neighbor search. The classical example of a two-dimensional skyline query, is a search for hotels that are cheap and close to the beach. Hotels that are more expensive and further away from the beach are referred to as dominated and do not need to be further inspected by the user. Since any hotel could potentially dominate any other hotel, there exist data dependencies across the entire data set. This is a challenge, in particular, for an implementation on an FPGA because it requires keeping track of a potentially large state, but on- chip storage resources are limited on FPGAs. Contributions. We present a solution for solving skyline queries on FPGAs that can handle an arbitrary number of dependencies and has no restrictions on the size of intermediate results. In our approach, it is sufficient to keep a small working set of skyline candidate tuples inside the FPGA (with the actual size determined by the avail- able FPGA resources), while the rest of the input tuples are treated as a data stream that propagates through the FPGA. We use pipeline-parallelism and nearest neighbor communication for concurrent manipulation of the active working set, combining data organization, computational power, and synchronization into a parallel processing model that naturally leverages the characteristics of FPGAs. Our solution exhibits high throughput and very good scalability. In our experiments, we show that throughput scales linearly with the amount of FPGA resources allocated. Using a low-end Virtex-5 FPGA, we clearly outperform a single-threaded CPU-based skyline operator and achieve performance close to the fastest known parallel implemen- tation [7], running on a high-performance 64-core server. II. SKYLINE QUERIES In this section, we will define skyline queries, a popular software algorithm to solve skyline queries (the BNL algo- rithm [6]), and our modified version of BNL for parallel execution on an FPGA. Our intention, here, is to picture our approach of parallelizing BNL on a high level, before we discuss technical details later. To do so, we will take the liberty of digressing into the world of Lemmings 1 . A. The Lemming Skyline Lemmings are primitive creatures that go on migrations in masses. On Lemmings Planet every year a challenge takes 1 As in the video game “Lemmings”: http://www.dmadesign.org/