Efficient KNN Search by Linear Projection of Image Clusters Zaher Al Aghbari, 1,∗ Ayoub Al-Hamadi 2,† 1 Department of Computer Science, University of Sharjah, Sharjah 27272, UAE 2 Institute for Electronics, Signal Processing and Communications (IESK), University Magdeburg, Magdeburg 4210, Germany K-nearest neighbors (KNN) search in a high-dimensional vector space is an important paradigm for a variety of applications. Despite the continuous efforts in the past years, algorithms to find the exact KNN answer set at high dimensions are outperformed by a linear scan method. In this paper, we propose a technique to find the exact KNN image objects to a given query object. First, the proposed technique clusters the images using a self-organizing map algorithm and then it projects the found clusters into points in a linear space based on the distances between each cluster and a selected reference point. These projected points are then organized in a simple, compact, and yet fast index structure called array-index. Unlike most indexes that support KNN search, the array-index requires a storage space that is linear in the number of projected points. The experiments show that the proposed technique is more efficient and robust to dimensionality as compared to other well-known techniques because of its simplicity and compactness. C 2011 Wiley Periodicals, Inc. 1. INTRODUCTION Content-based retrieval of similar objects, where an object is represented by a point in a d -dimensional feature space, is significant to many database applications such as medical image databases, 1−3 time-series databases, 4 and data mining. 5 These applications are required to support similarity search for objects. That is if an application is managing a database of N objects, given a query object q , the application should be able to return the K objects in the database that are most similar to q . In a distance space, such as the Euclidean space, these K objects are the K -nearest neighbors (KNN) to q . Finding KNN objects is one of the most expensive, but essential, operations in high-dimensional database applications. In large databases, given a query q , finding the KNN answer set by a linear scan method is prohibitively expensive, particularly, if the database objects are high dimensional. Even with the existing indexing structures, the response time of finding ∗ Author to whom all correspondence should be addressed; e-mail: zaher@sharjah.ac.ae. † e-mail: Ayoub.Al-Hamadi@ovgu.de. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 26, 844–865 (2011) C 2011 Wiley Periodicals, Inc. View this article online at wileyonlinelibrary.com. • DOI 10.1002/int.20496