Information Sciences 348 (2016) 209–226 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Scene classiﬁcation using local and global features with collaborative representation fusion Jinyi Zou a , Wei Li a,∗ , Chen Chen b , Qian Du c a College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China b Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080, USA c Department of Electrical and Computer Engineering, Mississippi State University, MS 39762, USA a r t i c l e i n f o Article history: Received 2 November 2015 Revised 4 February 2016 Accepted 8 February 2016 Available online 13 February 2016 Keywords: Scene classiﬁcation Locality-constrained linear coding Spatial pyramid matching Collaborative representation-based classiﬁcation a b s t r a c t This paper presents an effective scene classiﬁcation approach based on collaborative rep- resentation fusion of local and global spatial features. First, a visual word codebook is constructed by partitioning an image into dense regions, followed by the typical k-means clustering. A locality-constrained linear coding is employed on dense regions via the visual codebook, and a spatial pyramid matching strategy is then used to combine local features of the entire image. For global feature extraction, the method called multiscale completed local binary patterns (MS-CLBP) is applied to both the original gray scale image and its Ga- bor feature images. Finally, kernel collaborative representation-based classiﬁcation (KCRC) is employed on the extracted local and global features, and class label of the testing image is assigned according to the minimal approximation residual after fusion. The proposed method is evaluated by using four commonly-used datasets including two remote sens- ing images datasets, an indoor and outdoor scenes dataset, and a sports action dataset. Experimental results demonstrate that the proposed method signiﬁcantly outperforms the state-of-the-art methods. © 2016 Elsevier Inc. All rights reserved. 1. Introduction In the last decade, scene classiﬁcation has drawn increasing attention both in academia and industry [14,37,45,46,57,63]. The task is to automatically classify an image by feature extraction and label assignment. Although great effort in extracting features (e.g., hash codes [10,15,17], manifold structures [24,56,58], etc) has been made, it is still a challenging task due to many factors to be considered such as variations in spatial position, illumination, and scale. In the early days, scene classiﬁcation methods mainly concentrated on modeling [25] and using global spatial features such as color and texture histograms [41]. The global features often have simple implementation and low computational cost but offer limited performance. In [11,36,48,50,59,61], the popular bag-of-visual-words (BoVW) model was adopted, which represented an image with an orderless collection of local features. In this model, an image can be treated as a document, similar to “words”. The image is usually partitioned into patches and represented by a codebook. To this end, it follows three steps: (i) feature detection (commonly, the key point detection [30,31] is used.), (ii) feature description based on key points [9,27], and (iii) codebook generation. However, the BoVW model ignores the spatial layout of features. ∗ Corresponding author. Tel.: +86 10 64413467, +86 18146529853; fax: +86 10 64434726. E-mail address: liwei089@ieee.org, leewei36@gmail.com (W. Li). http://dx.doi.org/10.1016/j.ins.2016.02.021 0020-0255/© 2016 Elsevier Inc. All rights reserved.