Semantic Image Classification with Hierarchical Feature Subset Selection Yuli Gao Dept of Computer Science UNC-Charlotte Charlotte, NC 28223, USA ygao@uncc.edu Jianping Fan Dept of Computer Science UNC-Charlotte Charlotte, NC 28223, USA jfan@uncc.edu ABSTRACT High-dimensional visual features for image content charac- terization enables eﬀective image classiﬁcation. However, training accurate image classiﬁers in high-dimensional fea- ture space suﬀers from the problem of curse of dimensional- ity and thus requires a large number of labeled images. To achieve accurate classiﬁer training in high-dimensional fea- ture space, we propose a hierarchical feature subset selection algorithm for semantic image classiﬁcation, where the fea- ture subset selection procedure is seamlessly integrated with the underlying classiﬁer training procedure in a single algo- rithm. First, our hierarchical feature subset selection frame- work partitions the high-dimensional feature space into mul- tiple homogeneous feature subspaces and forms a two-level feature hierarchy. Second, weak image classiﬁers are trained for each homogeneous feature subspace at the lower level of the feature hierarchy, where the traditional feature subset selection techniques such as principal component analysis (PCA) can be used for dimension reduction. Finally, these weak classiﬁers are boosted to determine an optimal image classiﬁer and the higher-level feature subset selection is real- ized by selecting the most eﬀective weak classiﬁers and their corresponding homogeneous feature subsets. Our experi- ments on a speciﬁc domain of natural images have obtained very positive results. Categories and Subject Descriptors I.4.7 [Image Processing and Computer Vision]: Fea- ture Measurement— feature representation ; I.5 [Artiﬁcial Intelligence]: Learning— concept learning General Terms Algorithms, Experimentation Keywords Feature selection, classiﬁer training, semantic image classi- ﬁcation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MIR’05, November 10–11, 2005, Singapore. Copyright 2005 ACM 1-59593-244-5/05/0011 ...$5.00. 1. INTRODUCTION As high-resolution digital cameras become more aﬀord- able and widespread, personal collections of digital images are growing exponentially. Thus, semantic image classiﬁca- tion becomes increasingly important and necessary to sup- port automatic image annotation and semantic image re- trieval via keywords [22, 20, 25, 10, 34, 1]. However, the per- formance of image classiﬁers largely depends on two inter- related issues: (1) The quality of features [19] that are used for image content representation; (2) The eﬀectiveness of classiﬁer training algorithm. Many image classiﬁcation systems have been proposed in the literatures [27, 32, 2, 29, 26, 19], and they can be gen- erally classiﬁed into two categories based on the underlying framework for image content representation [4, 35, 28, 18]: (a) The ﬁrst category segments the image into some mean- ingful components and uses them as semantic elements to characterize image content [3, 21]. For example, Carson et al. proposed a blob-based image representation which cal- culates image similarities based on the visual similarities of the image blobs [8]. (b) The second category takes an image as a whole visual appearance and characterize image con- tents by using image-based global visual features [11, 23]. A well-known example is the system developed by Torralba and Oliva which uses discriminant structural templates to represent the global visual properties of natural scene im- ages [31]. Despite the diﬀerent natures of the underlying image con- tent representation framework, most existing semantic im- age classiﬁcation techniques rely on high-dimensional visual features. Ideally, using more visual features can enhance the classiﬁer’s ability in identifying diﬀerent semantic im- age concepts and thus result in higher classiﬁcation accu- racy. However, learning the image classiﬁer in such high- dimensional feature space requires a large number of labeled samples that generally increases exponentially as the fea- ture dimension increases [17]. Thus, automatically selecting the low-dimensional feature subset with high discrimination power and training the image classiﬁer in such relatively low-dimensional feature subset are one promising solution to address the problem of the curse of dimensionality. To select the optimal feature subset for classiﬁer training, many algorithms have been proposed which can be gener- ally classiﬁed into two categories: ﬁlter and wrapper.A ﬁlter algorithm separates the procedures for feature subset selec- tion and classiﬁer training by merely calculating the ranking information for each feature dimension based on its corre- lation score with the prediction variable. A wrapper algo- 135