Reducing TPC-H Benchmarking Time Pedro Trancoso 1 , Christodoulos Adamou 1 and Hans Vandierendonck 2 1 Department of Computer Science, University of Cyprus 75 Kallipoleos Ave., P.O.Box 20537, 1678 Nicosia, Cyprus {pedro,cs98ca1}@cs.ucy.ac.cy 2 Dept. of Electronics and Information Systems Ghent University, Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium hvdieren@elis.UGent.be Abstract. Benchmarking a system can be a time consuming opera- tion. Therefore, many researchers have developed kernels and micro- benchmarks. Nevertheless, these programs are not able to capture the details of a full application. One such example are the complex database applications. In this work we present a methodology based on a statistical method, Principal Component Analysis, in order to reduce the execution time of TPC-H, a decision support benchmark. This technique selects a subset of queries from the original set that are relevant and may be used to evaluate the systems. We use the subsets to determine the ranking of diﬀerent computer systems. Our experiments show that with a small subset of 5 queries we are able to rank diﬀerent systems with more than 80% accuracy in comparison with the original order and this result is achieved with as little as 20% of the original benchmark execution time. 1 Introduction Benchmarking is a common practice for the evaluation of new computer systems. By executing certain benchmarks, the manufacturers are able to highlight the characteristics of a certain system and also to rank the system against the rest. In addition, benchmarking is widely used for computer architecture research where the programs are executed on simulated proposed architectures. As more realistic programs are used, the benchmarking process becomes more time-consuming. Common benchmark programs are grouped into suites which represent a speciﬁc class of applications. Common examples are the SPEC [1] benchmark suite for scientiﬁc applications, EEMBC [2] benchmark suite representative of applications for embedded systems, and TPC-C [3] and TPC-H [4] representing database applications belonging to the class of online transaction and decision support system, respectively. Reduced benchmark suites have much value when analyzing computer perfor- mance, especially in those cases where performance analysis is time-consuming, which is the case in computer architecture research. For this purpose, the Minne- SPEC reduced inputs were proposed for computer architecture research [5]. These inputs attempt to mimic the behavior of the SPEC benchmark suite while