IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008 409
Batch Nearest Neighbor Search for Video Retrieval
Jie Shao, Zi Huang, Heng Tao Shen, Xiaofang Zhou, Senior Member, IEEE, Ee-Peng Lim, Senior Member, IEEE,
and Yijun Li
Abstract—To retrieve similar videos to a query clip from a large
database, each video is often represented by a sequence of high-
dimensional feature vectors. Typically, given a query video con-
taining feature vectors, an independent nearest neighbor (NN)
search for each feature vector is often first performed. After com-
pleting all the NN searches, an overall similarity is then computed,
i.e., a single content-based video retrieval usually involves in-
dividual NN searches. Since normally nearby feature vectors in a
video are similar, a large number of expensive random disk ac-
cesses are expected to repeatedly occur, which crucially affects the
overall query performance. Batch nearest neighbor (BNN) search
is stated as a batch operation that performs a number of individual
NN searches. This paper presents a novel approach towards effi-
cient high-dimensional BNN search called dynamic query ordering
(DQO) for advanced optimizations of both I/O and CPU costs. Ob-
serving the overlapped candidates (or search space) of a pervious
query may help to further reduce the candidate sets of subsequent
queries, DQO aims at progressively finding a query order such that
the common candidates among queries are fully utilized to maxi-
mally reduce the total number of candidates. Modelling the candi-
date set relationship of queries by a candidate overlapping graph
(COG), DQO iteratively selects the next query to be executed based
on its estimated pruning power to the rest of queries with the dy-
namically updated COG. Extensive experiments are conducted on
real video datasets and show the significance of our BNN query
processing strategy.
Index Terms—Content-based retrieval, high-dimensional in-
dexing, multimedia databases, query processing.
I. INTRODUCTION
R
ECENTLY with the rapid increase of both centralized
video archives and distributed video resources on the
World Wide Web, the research on content-based video retrieval
(CBVR) has become very active. As shown in Fig. 1, besides
the database and graphic user interface (GUI), a generic CBVR
system contains three major modules: video segmentation
and feature extraction, feature vector organization, and video
search engine. Videos have to be preprocessed to realize the
Manuscript received January 26, 2007; revised August 30, 2007. This work
was supported by the Australian Research Council under Grant DP0663272.
The associate editor coordinating the review of this manuscript and approving
it for publication was Dr. Yong Rui.
J. Shao, H. T. Shen, and X. Zhou are with the School of Information Tech-
nology and Electrical Engineering, The University of Queensland, Brisbane
QLD 4072, Australia (e-mail: jshao@itee.uq.edu.au; shenht@itee.uq.edu.au;
zxf@itee.uq.edu.au).
Z. Huang is with the Knowledge Media Institute, The Open University,
Milton Keynes, MK7 6AA, U.K. (e-mail: h.huang@open.ac.uk).
E.-P. Lim is with the Division of Information Systems, School of Com-
puter Engineering, Nanyang Technological University, Singapore (e-mail:
aseplim@ntu.edu.sg).
Y. Li is with Nielsen Media Research, Brisbane QLD 4000, Australia (e-mail:
yijun.li@nielsen.com).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMM.2008.917339
functionality of retrieval. A video is first divided into a number
of elementary segments, and the feature of each segment is
then extracted [1]. Videos can have both spatial and temporal
features. Spatial features include still image features, such as
color, shape, texture, etc. Temporal features consist of object
motion, camera operation, etc. Normally, all the features are
represented by high-dimensional vectors. In short, each video
is translated to a sequence of high-dimensional feature vectors
in video segmentation and feature extraction. For a large video
database, scanning on all these vectors is strongly undesirable
due to the high complexity of video features. By video sum-
marization [2], [3], the complexity can be reduced to a level
that can be efficiently managed. In feature vector organization,
compact video representations are organized in a way such
that fast retrieval is feasible. video search engine searches the
underlying indexing structure with some high-dimensional
access method [4]–[11] to speed up the query processing.
In conventional content-based similarity search, a query con-
sumes a single NN search by traversing the indexing structure
once. However, a distinguishing characteristic of video retrieval
is that, each video is described by a sequence of feature vectors,
so as to the query. Denote a query clip as
and a database video as , i.e., and
have and feature vectors (or representatives, if some sum-
marization is applied), to identify whether is similar to or
contains , typically for each , a search is first performed
in to retrieve the similar feature vectors to . A typical video
similarity measure is to compute the percentage of similar fea-
ture vectors shared by two videos [2], [3], [12], [13]. Given
and , their similarity is defined as
(1)
where if is relevant to some and
otherwise. Retrieval of similar feature vectors is processed as a
range or search in high-dimensional space. The answers
of all query vectors are then integrated to determine the final
result.
Now the problem comes on the way: totally the similarity
search has to be performed for time since there are feature
vectors in . For large video databases, due to the large number
of feature vectors and the high dimensionality, it is a great chal-
lenge. We address this problem as batch nearest neighbor (BNN)
search, which is defined as a batch operation that can efficiently
perform a number of individual NN searches on the same data-
base simultaneously.
1
Content-based video retrieval is one of its
typical applications. Given a query clip, a series of separate NN
searches will incur lots of expensive I/O cost for random disk
1
For the purpose of easy illustration, we use NN search ( in
search) for discussion. The extension to search is straightforward.
1520-9210/$25.00 © 2008 IEEE