Feature selection for improved classification accuracy targeting riverine sand mapping Virat Arora 1 • S. Srinivasa Rao 2 • E. Amminedu 1 • P. Jagadeeswara Rao 1 Received: 22 April 2020 / Revised: 10 August 2020 / Accepted: 18 August 2020 Ó Korean Spatial Information Society 2020 Abstract Regular monitoring of riverine sand is crucial for its sustainable management. Towards readily generating spatial information regarding the extent of riverine sand, this study utilizes satellite-based remote sensing data as input features while applying Support Vector Machine (SVM) classifier to discern the inherent land cover classes. The input features comprised of spectral bands, derived spectral indices, derived textural features and ancillary information such as elevation, slope and aspect. The objective of this study was to identify a set of features that help improve the SVM classification accuracy with respect to the benchmark accuracy achieved by using only the spectral bands. Apart from testing a few commonly used logical combinations of features, the study focused on employing the Correlation-based Feature Selection (CFS) method along with the best-first search algorithm to gen- erate a feature-set that comprised of most relevant and least correlated features. This feature-set was found to improve the SVM classification accuracy by 2.9% with respect to the benchmark value. Further, when tested with the limited number of training samples, similar results were achieved. This study proves that it is beneficial to utilize the CFS method for feature selection prior to SVM classification and recommends a set of 19 remote sensing derived fea- tures that are relevant towards riverine sand mapping. Keywords Riverine sand mapping Á Sentinel-2 Á SVM Á Correlation-based feature selection (CFS) Á Google Earth Engine (GEE) Á Waikato environment for knowledge analysis (WEKA) 1 Introduction Sand sourced from the riverine environment is a valuable raw material for the construction industry. With the rapid infrastructural development, the demand for sand has grown many-folds in the past few decades [1]. This has caused unabated sand extraction leading to degradation of the riverine ecosystems [2, 3]. Towards promoting sus- tainable sand mining practices, environmentalists suggest for the regular auditing of within-channel riverine sand [2]. Remote sensing assisted surveys have shown the potential to be effectively used in this regard [4]. Remotely sensed multispectral images offer the pixel- wise capturing of the surface radiance spectra that may be harnessed to derive the information about the surficial spread of riverine sand [5]. Supervised image-classification approaches are widely used to determine the inherent land cover classes within the study area [6, 7]. In supervised classification, the unclassified data is labelled with the most appropriate class-name based on the learnings developed through the application of the algorithm over a sampled training dataset comprising known labels from the ground reference. One of the supervised classification methods is Support Vector Machine (SVM), which is widespread among the applications involving pattern recognition, particularly, in the remote sensing field [8, 9]. SVM is a non-parametric algorithm, meaning that, any kind of assumption about the data model is not required to be made while its execution, & Virat Arora aroravirat@gmail.com 1 Department of Geo-Engineering, Andhra University College of Engineering, Visakhapatnam, Andhra Pradesh, India 2 National Remote Sensing Centre (ISRO), Hyderabad, Telangana, India 123 Spat. Inf. Res. https://doi.org/10.1007/s41324-020-00359-1