Efficient KNN Search by Linear Projection of Image Clusters Zaher Al Aghbari, 1, Ayoub Al-Hamadi 2, 1 Department of Computer Science, University of Sharjah, Sharjah 27272, UAE 2 Institute for Electronics, Signal Processing and Communications (IESK), University Magdeburg, Magdeburg 4210, Germany K-nearest neighbors (KNN) search in a high-dimensional vector space is an important paradigm for a variety of applications. Despite the continuous efforts in the past years, algorithms to find the exact KNN answer set at high dimensions are outperformed by a linear scan method. In this paper, we propose a technique to find the exact KNN image objects to a given query object. First, the proposed technique clusters the images using a self-organizing map algorithm and then it projects the found clusters into points in a linear space based on the distances between each cluster and a selected reference point. These projected points are then organized in a simple, compact, and yet fast index structure called array-index. Unlike most indexes that support KNN search, the array-index requires a storage space that is linear in the number of projected points. The experiments show that the proposed technique is more efficient and robust to dimensionality as compared to other well-known techniques because of its simplicity and compactness. C 2011 Wiley Periodicals, Inc. 1. INTRODUCTION Content-based retrieval of similar objects, where an object is represented by a point in a d -dimensional feature space, is significant to many database applications such as medical image databases, 13 time-series databases, 4 and data mining. 5 These applications are required to support similarity search for objects. That is if an application is managing a database of N objects, given a query object q , the application should be able to return the K objects in the database that are most similar to q . In a distance space, such as the Euclidean space, these K objects are the K -nearest neighbors (KNN) to q . Finding KNN objects is one of the most expensive, but essential, operations in high-dimensional database applications. In large databases, given a query q , finding the KNN answer set by a linear scan method is prohibitively expensive, particularly, if the database objects are high dimensional. Even with the existing indexing structures, the response time of finding Author to whom all correspondence should be addressed; e-mail: zaher@sharjah.ac.ae. e-mail: Ayoub.Al-Hamadi@ovgu.de. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 26, 844–865 (2011) C 2011 Wiley Periodicals, Inc. View this article online at wileyonlinelibrary.com. DOI 10.1002/int.20496