Fast Range Query Estimation by N-Level Tree Histograms Francesco Buccafurri and Gianluca Lax Dipartimento di Informatica, Matematica, Elettronica e Trasporti Universit` a “Mediterranea” di Reggio Calabria - Italy e-mail: bucca@unirc.it, lax@ing.unirc.it Abstract Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP appli- cations, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, have been given in the recent past. Besides bucket-based histograms (i.e., histograms whose construction is driven by the search of a “good” domain partition), there are different new his- tograms, characterized by more complex structures (like, for instance, wavelet-based histograms). This paper presents a new histogram, called nLT, belonging to the lat- ter class. It is based on a hierarchical decomposition of the original data distribution kept in a full binary tree. This tree, containing a set of pre-computed hierarchical queries, uses bit saving for representing integer numbers, so that the reduced stor- age space allows us to increase the tree resolution and, consequently, its accuracy. Experimental comparison shows the superiority of nLT w.r.t. the state-of-the-art histograms. Key words: Query Optimization, Optimization and Performance, Range Query Estimation, Histograms, Data Reduction. A shorter abridged version of this paper appeared in Proceedings of the Int. Conference on Data Warehousing and Knowledge Discovery, Y. Kambayashi, M. Mohania, W. W¨ oß (Eds.): DaWak 2003, LNCS 2737, pp. 350-359, 2003. c Springer- Verlag Berlin Heidelberg 2003. Corresponding author. Preprint submitted to Elsevier Science 9 April 2004