IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008 409 Batch Nearest Neighbor Search for Video Retrieval Jie Shao, Zi Huang, Heng Tao Shen, Xiaofang Zhou, Senior Member, IEEE, Ee-Peng Lim, Senior Member, IEEE, and Yijun Li Abstract—To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of high- dimensional feature vectors. Typically, given a query video con- taining feature vectors, an independent nearest neighbor (NN) search for each feature vector is often first performed. After com- pleting all the NN searches, an overall similarity is then computed, i.e., a single content-based video retrieval usually involves in- dividual NN searches. Since normally nearby feature vectors in a video are similar, a large number of expensive random disk ac- cesses are expected to repeatedly occur, which crucially affects the overall query performance. Batch nearest neighbor (BNN) search is stated as a batch operation that performs a number of individual NN searches. This paper presents a novel approach towards effi- cient high-dimensional BNN search called dynamic query ordering (DQO) for advanced optimizations of both I/O and CPU costs. Ob- serving the overlapped candidates (or search space) of a pervious query may help to further reduce the candidate sets of subsequent queries, DQO aims at progressively finding a query order such that the common candidates among queries are fully utilized to maxi- mally reduce the total number of candidates. Modelling the candi- date set relationship of queries by a candidate overlapping graph (COG), DQO iteratively selects the next query to be executed based on its estimated pruning power to the rest of queries with the dy- namically updated COG. Extensive experiments are conducted on real video datasets and show the significance of our BNN query processing strategy. Index Terms—Content-based retrieval, high-dimensional in- dexing, multimedia databases, query processing. I. INTRODUCTION R ECENTLY with the rapid increase of both centralized video archives and distributed video resources on the World Wide Web, the research on content-based video retrieval (CBVR) has become very active. As shown in Fig. 1, besides the database and graphic user interface (GUI), a generic CBVR system contains three major modules: video segmentation and feature extraction, feature vector organization, and video search engine. Videos have to be preprocessed to realize the Manuscript received January 26, 2007; revised August 30, 2007. This work was supported by the Australian Research Council under Grant DP0663272. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Yong Rui. J. Shao, H. T. Shen, and X. Zhou are with the School of Information Tech- nology and Electrical Engineering, The University of Queensland, Brisbane QLD 4072, Australia (e-mail: jshao@itee.uq.edu.au; shenht@itee.uq.edu.au; zxf@itee.uq.edu.au). Z. Huang is with the Knowledge Media Institute, The Open University, Milton Keynes, MK7 6AA, U.K. (e-mail: h.huang@open.ac.uk). E.-P. Lim is with the Division of Information Systems, School of Com- puter Engineering, Nanyang Technological University, Singapore (e-mail: aseplim@ntu.edu.sg). Y. Li is with Nielsen Media Research, Brisbane QLD 4000, Australia (e-mail: yijun.li@nielsen.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2008.917339 functionality of retrieval. A video is first divided into a number of elementary segments, and the feature of each segment is then extracted [1]. Videos can have both spatial and temporal features. Spatial features include still image features, such as color, shape, texture, etc. Temporal features consist of object motion, camera operation, etc. Normally, all the features are represented by high-dimensional vectors. In short, each video is translated to a sequence of high-dimensional feature vectors in video segmentation and feature extraction. For a large video database, scanning on all these vectors is strongly undesirable due to the high complexity of video features. By video sum- marization [2], [3], the complexity can be reduced to a level that can be efficiently managed. In feature vector organization, compact video representations are organized in a way such that fast retrieval is feasible. video search engine searches the underlying indexing structure with some high-dimensional access method [4]–[11] to speed up the query processing. In conventional content-based similarity search, a query con- sumes a single NN search by traversing the indexing structure once. However, a distinguishing characteristic of video retrieval is that, each video is described by a sequence of feature vectors, so as to the query. Denote a query clip as and a database video as , i.e., and have and feature vectors (or representatives, if some sum- marization is applied), to identify whether is similar to or contains , typically for each , a search is first performed in to retrieve the similar feature vectors to . A typical video similarity measure is to compute the percentage of similar fea- ture vectors shared by two videos [2], [3], [12], [13]. Given and , their similarity is defined as (1) where if is relevant to some and otherwise. Retrieval of similar feature vectors is processed as a range or search in high-dimensional space. The answers of all query vectors are then integrated to determine the final result. Now the problem comes on the way: totally the similarity search has to be performed for time since there are feature vectors in . For large video databases, due to the large number of feature vectors and the high dimensionality, it is a great chal- lenge. We address this problem as batch nearest neighbor (BNN) search, which is defined as a batch operation that can efficiently perform a number of individual NN searches on the same data- base simultaneously. 1 Content-based video retrieval is one of its typical applications. Given a query clip, a series of separate NN searches will incur lots of expensive I/O cost for random disk 1 For the purpose of easy illustration, we use NN search ( in search) for discussion. The extension to search is straightforward. 1520-9210/$25.00 © 2008 IEEE