THEORETICAL ADVANCES Exploiting computer resources for fast nearest neighbor classiﬁcation Jose ´ R. Herrero Æ Juan J. Navarro Received: 28 November 2005 / Accepted: 27 January 2007 / Published online: 30 March 2007 Ó Springer-Verlag London Limited 2007 Abstract Modern computers provide excellent opportu- nities for performing fast computations. They are equipped with powerful microprocessors and large memories. However, programs are not necessarily able to exploit those computer resources effectively. In this paper, we present the way in which we have implemented a nearest neighbor classiﬁcation. We show how performance can be improved by exploiting the ability of superscalar proces- sors to issue multiple instructions per cycle and by using the memory hierarchy adequately. This is accomplished by the use of ﬂoating-point arithmetic which usually outper- forms integer arithmetic, and block (tiled) algorithms which exploit the data locality of programs allowing for an efﬁcient use of the data stored in the cache memory. Our results are validated with both an analytical model and empirical results. We show that regular codes could be performed faster than more complex irregular codes using standard data sets. 1 Introduction The nearest neighbor (NN) classiﬁcation procedure is a popular technique in pattern recognition, speech recogni- tion, multitarget tracking, medical diagnosis tools, etc. A major concern in its implementation is the immense computational load required in practical problem environ- ments. Other important issues are the amount of storage required and the data access time. In this paper, we address these issues by using tech- niques widely used in linear algebra codes. We show that a simple code can be very efﬁcient on commodity processors and can sometimes outperform complex codes which could prove more difﬁcult to implement efﬁciently. Comparison of the NN with other methods on different application areas can be found elsewhere [1–3]. To ﬁnd disquisitions about appropriate distance measures the reader is referred to [3–6]. 1.1 Computer resources Computer architecture has evolved very quickly in the last decades with important improvements in many areas. We will focus on two aspects which are essential to the exe- cution of programs: processor and memory. Current microprocessors have very fast clocks and multiple func- tional units within the processor. Potentially, some pro- cessors can execute thousands of millions of operations per second. However, even general purpose processors are usually optimized for scientiﬁc computations which require arithmetic with real numbers. This means that, on many processors, a multiplication of ﬂoating point numbers will be done much faster than the product of two integers. When multiple functional units are present, several arith- metic operations can be done at the same time. In addition, when those functional units are pipelined, a new arithmetic instruction can be started each cycle, with several opera- tions proceeding through the pipeline. This can be achieved when the code is very regular and data is accessed quickly. On the other hand, integer arithmetic is usually slower. J. R. Herrero (&)  J. J. Navarro Computer Architecture Department, Universitat Polite `cnica de Catalunya, Jordi Girona 1-3, Mo `dul D6, 08034 Barcelona, Spain e-mail: josepr@ac.upc.edu J. J. Navarro e-mail: juanjo@ac.upc.edu 123 Pattern Anal Applic (2007) 10:265–275 DOI 10.1007/s10044-007-0065-y