FOPA: A Final Object Pruning Algorithm to Efficiently Produce Skyline Points Ana Alvarado, Oriana Baldizan, Marlene Goncalves, and Maria-Esther Vidal Universidad Sim´ on Bol´ ıvar, Venezuela {aalvarado,obaldizan,mgoncalves,mvidal}@ldc.usb.ve Abstract. We consider the problem of locating the best points in large multidi- mensional datasets. The goal is to efficiently generate all the points that meet a multi-objective query on data distributed in Vertically Partitioned Tables (VPTs). To compute the skyline on large VPTs, costly joins and comparisons may need to be executed, negatively impacting on the query execution time. We propose a new algorithm named FOPA (F inal O bject P runing A lgorithm) which is able to efficiently produce the whole set of skyline points and scales up to large datasets. FOPA relies on ordered VPTs, information on the values seen so far, and indices on the VPTs, to prune the space of dominated points and identify the skyline for large datasets in less time than state-of-the-art approaches. Empirically, we study the performance and scalability of FOPA in synthetic data and compare FOPA with existing approaches; our results suggest that FOPA outperforms existing so- lutions by up to two orders of magnitude. 1 Introduction Nowadays, Web based infrastructures have been developed and allow large datasets to be published and accessed from any node of the Internet. Users more than ever can re- trieve data that satisfies their requests by just searching or consulting any of the available sources of data. Although the democratization of the information provides the basis to discover properties and relationships that could not be identified years before, there are still applications where it is important to efficiently identify the best tuples that satisfy a query. That is, applications designed not only to meet soundness and completeness of the answers, but also to provide few relevant answers quickly. We address this problem and based on related work, we devise a solution to this ranking problem where tuples that best meet a given user request correspond to the skyline tuples. A skyline is a set of tuples such that, none of them is better than the rest. Sky- line techniques have gained great attention in the literature [7,9,17], and state-of-the-art approaches have focused on identifying the skyline tuples while the number of com- parisons is minimized. In the database area, several techniques have been proposed to efficiently identify the best tuples in a skyline [3,11,19]; additionally, few approaches have considered this problem in the context of RDF, where data may be represented as VPTs [6]. Particularly, approaches proposed by Balke et al. [3] and Chen et al. [6] assume that the data is stored or distributed following a vertically partitioned table rep- resentation, i.e., for each data dimension or RDF property a, there exists a relation H. Decker et al. (Eds.): DEXA 2013, Part II, LNCS 8056, pp. 334–348, 2013. c Springer-Verlag Berlin Heidelberg 2013