2492 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 5, MAY 2014 Active Learning in the Spatial Domain for Remote Sensing Image Classification André Stumpf, Nicolas Lachiche, Jean-Philippe Malet, Norman Kerle, and Anne Puissant Abstract—Active learning (AL) algorithms have been proven useful in reducing the number of required training samples for re- mote sensing applications; however, most methods query samples pointwise without considering spatial constraints on their distri- bution. This may often lead to a spatially dispersed distribution of training points unfavorable for visual image interpretation or field surveys. The aim of this study is to develop region-based AL heuristics to guide user attention toward a limited number of compact spatial batches rather than distributed points. The proposed query functions are based on a tree ensemble classi- fier and combine criteria of sample uncertainty and diversity to select regions of interest. Class imbalance, which is inherent to many remote sensing applications, is addressed through stratified bootstrap sampling. Empirical tests of the proposed methods are performed with multitemporal and multisensor satellite images capturing, in particular, sites recently affected by large-scale land- slide events. The assessment includes an experimental evaluation of the labeling time required by the user and the computational runtime, and a sensitivity analysis of the main algorithm param- eters. Region-based heuristics that consider sample uncertainty and diversity are found to outperform pointwise sampling and region-based methods that consider only uncertainty. Reference landslide inventories from five different experts enable a detailed assessment of the spatial distribution of remaining errors and the uncertainty of the reference data. Manuscript received September 16, 2012; revised January 18, 2013 and March 19, 2013; accepted April 18, 2013. Date of publication July 12, 2013; date of current version March 3, 2014. This work was supported in part by the project FOSTER “Spatio-temporal data mining: application to the understanding and monitoring of soil erosion” funded by the French Research Agencya under Contract ANR Cosinus, 2011–2013; by the project SafeLand “Living with landslide risk in Europe: assessment, effects of global change, and risk management strategies” under Grant Agreement 226479 funded by the 7th Framework Programme of the European Commission; and by the project “Landslide mapping at various spatial scales” funded by the EUR-OPA Major Hazards Open Partial Agreement of Council of Europe. A. Stumpf is with the Laboratoire Image, Ville, Environnement, Centre National de la Recherche Scientifique UMR 7362, University of Strasbourg, 67000 Strasbourg, France, and also with the École et Observatoire des Sciences de la Terre—Institut de Physique du Globe de Strasbourg, Centre National de la Recherche Scientifique UMR 7516, University of Strasbourg, 67084 Strasbourg Cedex, France (e-mail: andre.stumpf@live-cnrs.unistra.fr). N. Lachiche is with the Image Sciences, Computer Sciences and Remote Sensing Laboratory, Centre National de la Recherche Scientifique UMR 7005, University of Strasbourg, 67412 Strasbourg Cedex, France (e-mail: nicolas. lachiche@unistra.fr). J.-P. Malet is with the École et Observatoire des Sciences de la Terre— Institut de Physique du Globe de Strasbourg, Centre National de la Recherche Scientifique UMR 7516, University of Strasbourg, 67084 Strasbourg Cedex, France (e-mail: jeanphilippe.malet@unistra.fr). N. Kerle is with the Faculty of Geo-Information Science and Earth Obser- vation (ITC), University of Twente, 7500 Enschede, The Netherlands (e-mail: kerle@itc.nl). A. Puissant is with the Laboratoire Image, Ville, Environnement, Centre National de la Recherche Scientifique UMR 7362, University of Strasbourg, 67000 Strasbourg, France (e-mail: anne.puissant@live-cnrs.unistra.fr). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2013.2262052 Index Terms—Active learning (AL), batch-mode, class imbal- ance, ground truth uncertainty, image classification, landslide inventory mapping, spatial information. TABLE OF SYMBOLS H Vote entropy. p i Fraction of votes for the ith class. W × Queried region. w Edge length of the squared query region. M H Entropy map. μ H Mean local vote entropy. x, y x, y coordinates on the regular search grid. G Search grid. g Cell size of the search grid. X Training set. n Number of iterations. U Unlabeled pool. S Set of queried samples. s Individual samples in different sets. fun Selected diversity function. c Individual samples in a candidate set. σ d Standard deviation of the feature space distances between the candidate batch and the training set. |W m | Cardinality of the candidate set. ρ k (X, c) Euclidean distance between a candidate sample and its nearest training point in feature space. m Number of candidate regions. W m Candidate set. H × Cross-entropy. v D Volume of the unit ball. R D D-dimensional feature space. D Number of features. ψ Digamma function. |X| Cardinality of the training set. k Order of the nearest neighbor search. t i Minimum variable importance threshold. F F-measure. I. I NTRODUCTION M ACHINE learning algorithms have become important tools for the extraction of environmental information from remote sensing images. State-of-the-art supervised algo- rithms, such as support vector machines (SVMs), artificial neural networks, and ensemble-based learning methods [1], have al- ready been developed, among others, for land cover analysis [2], [3], biophysical parameter estimation [4], [5], change and anom- aly detection [6], [7], and geomorphological mapping [8], [9]. 0196-2892 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.