Information Sciences 348 (2016) 209–226
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
Scene classification using local and global features with
collaborative representation fusion
Jinyi Zou
a
, Wei Li
a,∗
, Chen Chen
b
, Qian Du
c
a
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
b
Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080, USA
c
Department of Electrical and Computer Engineering, Mississippi State University, MS 39762, USA
a r t i c l e i n f o
Article history:
Received 2 November 2015
Revised 4 February 2016
Accepted 8 February 2016
Available online 13 February 2016
Keywords:
Scene classification
Locality-constrained linear coding
Spatial pyramid matching
Collaborative representation-based
classification
a b s t r a c t
This paper presents an effective scene classification approach based on collaborative rep-
resentation fusion of local and global spatial features. First, a visual word codebook is
constructed by partitioning an image into dense regions, followed by the typical k-means
clustering. A locality-constrained linear coding is employed on dense regions via the visual
codebook, and a spatial pyramid matching strategy is then used to combine local features
of the entire image. For global feature extraction, the method called multiscale completed
local binary patterns (MS-CLBP) is applied to both the original gray scale image and its Ga-
bor feature images. Finally, kernel collaborative representation-based classification (KCRC)
is employed on the extracted local and global features, and class label of the testing image
is assigned according to the minimal approximation residual after fusion. The proposed
method is evaluated by using four commonly-used datasets including two remote sens-
ing images datasets, an indoor and outdoor scenes dataset, and a sports action dataset.
Experimental results demonstrate that the proposed method significantly outperforms the
state-of-the-art methods.
© 2016 Elsevier Inc. All rights reserved.
1. Introduction
In the last decade, scene classification has drawn increasing attention both in academia and industry [14,37,45,46,57,63].
The task is to automatically classify an image by feature extraction and label assignment. Although great effort in extracting
features (e.g., hash codes [10,15,17], manifold structures [24,56,58], etc) has been made, it is still a challenging task due to
many factors to be considered such as variations in spatial position, illumination, and scale.
In the early days, scene classification methods mainly concentrated on modeling [25] and using global spatial features
such as color and texture histograms [41]. The global features often have simple implementation and low computational cost
but offer limited performance. In [11,36,48,50,59,61], the popular bag-of-visual-words (BoVW) model was adopted, which
represented an image with an orderless collection of local features. In this model, an image can be treated as a document,
similar to “words”. The image is usually partitioned into patches and represented by a codebook. To this end, it follows
three steps: (i) feature detection (commonly, the key point detection [30,31] is used.), (ii) feature description based on key
points [9,27], and (iii) codebook generation. However, the BoVW model ignores the spatial layout of features.
∗
Corresponding author. Tel.: +86 10 64413467, +86 18146529853; fax: +86 10 64434726.
E-mail address: liwei089@ieee.org, leewei36@gmail.com (W. Li).
http://dx.doi.org/10.1016/j.ins.2016.02.021
0020-0255/© 2016 Elsevier Inc. All rights reserved.