Faster Plan Generation through Consideration of Functional Dependencies and Keys Marius Eich University of Mannheim marius.eich@uni- mannheim.de Pit Fender Oracle Labs pit.fender@oracle.com Guido Moerkotte University of Mannheim moerkotte@uni- mannheim.de ABSTRACT It has been a recognized fact for many years that query execution can beneﬁt from pushing group-by operators down in the operator tree and applying them before a join. This so-called eager aggrega- tion reduces the size(s) of the join argument(s), making join evalu- ation faster. Lately, the idea enjoyed a revival when it was applied to outer joins for the ﬁrst time and incorporated in a state-of-the- art plan generator. However, this recent approach is highly depen- dent on the use of heuristics because of the exponential growth of the search space that goes along with eager aggregation. Finding an optimal solution for larger queries calls for effective optimality preserving pruning mechanisms to reduce the search space size as far as possible. By a more thorough investigation of functional de- pendencies and keys, we provide a set of new pruning criteria and evaluate their effectiveness with respect to the runtime and memory consumption of the resulting plan generator. 1. INTRODUCTION The idea of reordering group-by operators and joins was pro- posed already two decades ago ([12, 13, 14, 11, 1]) and has since been implemented in many commercial query optimizers. Howev- er, it was always limited to inner joins only. In a recent paper, Eich and Moerkotte revived the topic by show- ing that the optimal placement of group-by operators is possible in the presence of non-inner joins as well, thus enabling query op- timizers to apply this powerful optimization technique to a whole new class of queries. They describe a plan generator capable of reordering group-by and a wide range of different join operators. While their approach performs well for small queries, queries with more than ten rela- tions can only be handled by abandoning optimality and relying on heuristics [4]. The reason for this limitation is the lack of an effective optimality- preserving pruning criterion to limit the size of the search space and thereby allow the optimization of larger queries. A quick complex- ity analysis shows the importance of pruning in this context: A binary operator tree with n relations contains 2n − 2 edges, and we can attach a group-by to each of these edges and on top of the root, This work is licensed under the Creative Commons Attribution- NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Proceedings of the VLDB Endowment, Vol. 9, No. 10 Copyright 2016 VLDB Endowment 2150-8097/16/06. resulting in 2n − 1 possible positions for a group-by. If one con- siders all valid combinations of these positions for every tree, the additional overhead caused by the optimal placement of group-by operators is in O(2 2n−1 ). On the other hand, if one can infer at a certain position in the operator tree that the grouping attributes constitute a superkey, then a group-by at this position does not need to be considered. We give an in-depth analysis of four new optimality-preserving pruning criteria and an existing one that was proposed in [4]. They are derived by a careful investigation of keys and functional depen- dencies. We describe the pruning criteria with the help of some examples and evaluate them experimentally, thereby showing that they can speed up the plan generator by orders of magnitude. The correctness proofs for all pruning criteria discussed in this paper can be found in [3]. Section 2 contains some preliminaries concerning the notation used in this paper and the basics of a bottom-up plan generator. In Section 3 we take a closer look at the information needed during plan generation which is captured in the form of interesting plan properties. Our main contribution is contained in Sections 4, 5, 6 and 7, where we discuss the different pruning criteria. In Section 8, we show the results of our experiments and subsequently conclude the paper in Section 9. 2. PRELIMINARIES 2.1 Algebraic Operators In this section we provide deﬁnitions for the algebraic operators we will be using throughout the rest of the paper. We use standard set notation to denote bags. We deﬁne the group-by operator Γ as Γ G;a 1 :f 1 ,...,a k :f k (e) := {y ◦ [a1 : x1,...,a k : x k ] | y ∈ Π D G (e), xi = fi ({z|z ∈ e, z.G = y.G})}, for some set of grouping attributes G. The attributes a1 ...a k are created by applying the aggregation vector F =(f1,...,f k ), con- sisting of k aggregate functions, to the grouped tuples. We de- note by Π D A (e) the duplicate-removing projection onto the set of attributes A, applied to the expression e. The resulting relation on- ly contains values for those attributes that are contained in A and no duplicate values. The aggregate functions contained in F are then applied to groups of tuples taken from this relation. The groups contain tuples with equal values in the grouping attributes. The results are stored in attributes a1,...,a k . Henceforth, we use a shorter notation for the group-by, where we abbreviate the speciﬁcation of the aggregation vector and the 756