2492 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 5, MAY 2014
Active Learning in the Spatial Domain for
Remote Sensing Image Classification
André Stumpf, Nicolas Lachiche, Jean-Philippe Malet, Norman Kerle, and Anne Puissant
Abstract—Active learning (AL) algorithms have been proven
useful in reducing the number of required training samples for re-
mote sensing applications; however, most methods query samples
pointwise without considering spatial constraints on their distri-
bution. This may often lead to a spatially dispersed distribution
of training points unfavorable for visual image interpretation or
field surveys. The aim of this study is to develop region-based
AL heuristics to guide user attention toward a limited number
of compact spatial batches rather than distributed points. The
proposed query functions are based on a tree ensemble classi-
fier and combine criteria of sample uncertainty and diversity to
select regions of interest. Class imbalance, which is inherent to
many remote sensing applications, is addressed through stratified
bootstrap sampling. Empirical tests of the proposed methods are
performed with multitemporal and multisensor satellite images
capturing, in particular, sites recently affected by large-scale land-
slide events. The assessment includes an experimental evaluation
of the labeling time required by the user and the computational
runtime, and a sensitivity analysis of the main algorithm param-
eters. Region-based heuristics that consider sample uncertainty
and diversity are found to outperform pointwise sampling and
region-based methods that consider only uncertainty. Reference
landslide inventories from five different experts enable a detailed
assessment of the spatial distribution of remaining errors and the
uncertainty of the reference data.
Manuscript received September 16, 2012; revised January 18, 2013 and
March 19, 2013; accepted April 18, 2013. Date of publication July 12, 2013;
date of current version March 3, 2014. This work was supported in part
by the project FOSTER “Spatio-temporal data mining: application to the
understanding and monitoring of soil erosion” funded by the French Research
Agencya under Contract ANR Cosinus, 2011–2013; by the project SafeLand
“Living with landslide risk in Europe: assessment, effects of global change,
and risk management strategies” under Grant Agreement 226479 funded by the
7th Framework Programme of the European Commission; and by the project
“Landslide mapping at various spatial scales” funded by the EUR-OPA Major
Hazards Open Partial Agreement of Council of Europe.
A. Stumpf is with the Laboratoire Image, Ville, Environnement, Centre
National de la Recherche Scientifique UMR 7362, University of Strasbourg,
67000 Strasbourg, France, and also with the École et Observatoire des Sciences
de la Terre—Institut de Physique du Globe de Strasbourg, Centre National de la
Recherche Scientifique UMR 7516, University of Strasbourg, 67084 Strasbourg
Cedex, France (e-mail: andre.stumpf@live-cnrs.unistra.fr).
N. Lachiche is with the Image Sciences, Computer Sciences and Remote
Sensing Laboratory, Centre National de la Recherche Scientifique UMR 7005,
University of Strasbourg, 67412 Strasbourg Cedex, France (e-mail: nicolas.
lachiche@unistra.fr).
J.-P. Malet is with the École et Observatoire des Sciences de la Terre—
Institut de Physique du Globe de Strasbourg, Centre National de la Recherche
Scientifique UMR 7516, University of Strasbourg, 67084 Strasbourg Cedex,
France (e-mail: jeanphilippe.malet@unistra.fr).
N. Kerle is with the Faculty of Geo-Information Science and Earth Obser-
vation (ITC), University of Twente, 7500 Enschede, The Netherlands (e-mail:
kerle@itc.nl).
A. Puissant is with the Laboratoire Image, Ville, Environnement, Centre
National de la Recherche Scientifique UMR 7362, University of Strasbourg,
67000 Strasbourg, France (e-mail: anne.puissant@live-cnrs.unistra.fr).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TGRS.2013.2262052
Index Terms—Active learning (AL), batch-mode, class imbal-
ance, ground truth uncertainty, image classification, landslide
inventory mapping, spatial information.
TABLE OF SYMBOLS
H Vote entropy.
p
i
Fraction of votes for the ith class.
W
×
Queried region.
w Edge length of the squared query region.
M
H
Entropy map.
μ
H
Mean local vote entropy.
x, y x, y coordinates on the regular search grid.
G Search grid.
g Cell size of the search grid.
X Training set.
n Number of iterations.
U Unlabeled pool.
S Set of queried samples.
s Individual samples in different sets.
fun Selected diversity function.
c Individual samples in a candidate set.
σ
d
Standard deviation of the feature space distances
between the candidate batch and the training set.
|W
m
| Cardinality of the candidate set.
ρ
k
(X, c) Euclidean distance between a candidate sample and
its nearest training point in feature space.
m Number of candidate regions.
W
m
Candidate set.
H
×
Cross-entropy.
v
D
Volume of the unit ball.
R
D
D-dimensional feature space.
D Number of features.
ψ Digamma function.
|X| Cardinality of the training set.
k Order of the nearest neighbor search.
t
i
Minimum variable importance threshold.
F F-measure.
I. I NTRODUCTION
M
ACHINE learning algorithms have become important
tools for the extraction of environmental information
from remote sensing images. State-of-the-art supervised algo-
rithms, such as support vector machines (SVMs), artificial neural
networks, and ensemble-based learning methods [1], have al-
ready been developed, among others, for land cover analysis [2],
[3], biophysical parameter estimation [4], [5], change and anom-
aly detection [6], [7], and geomorphological mapping [8], [9].
0196-2892 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.