Accounting for spatial autocorrelation from model selection to statistical inference:
Application to a national survey of a diurnal raptor
Kévin Le Rest ⁎, David Pinaud, Vincent Bretagnolle
Centre d'Etudes Biologiques de Chizé (CEBC), CNRS UPR 1934, 79360 Beauvoir-Sur-Niort, France
abstract article info
Article history:
Received 26 January 2012
Accepted 30 November 2012
Available online 12 December 2012
Keywords:
Generalized Linear Models
Spatial cross-validation
Population size
Residual spatial autocorrelation
Spatial filtering
Species distribution
Planning actions for species conservation involves working at both an ecologically meaningful spatial scale
and a scale suitable for implementing management or conservation plans. Animal populations and conserva-
tion policies often operate across wide areas. Large-extent spatial datasets are thus often used, but their anal-
yses rarely deal with problems inherent to spatial datasets such as residual spatial autocorrelation, which can
bias or even reverse results. Here we propose a procedure for analysing a large-scale count dataset integrat-
ing residual spatial autocorrelation in a Generalized Linear Model framework by combining and extending
previously published methods. The first step concerns the selection of the environmental variables by a mod-
ified cross-validation procedure allowing for residual spatial autocorrelation. Then the second step consists in
evaluating the spatial effect of the model using a spatial filtering approach based on the variogram parame-
ters. We apply this method to the Black kite (Milvus migrans) to estimate the distribution and population size
of this species in France. We found some divergence in estimated population size between spatial and non
spatial models, as well as in the distribution map. We also found that the uncertainty of the model was
underestimated by the residual spatial autocorrelation. Our analysis confirms previous results, that residual
spatial autocorrelation should be always accounted for, especially in conservation where false results may
lead to poor management decisions.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
Animal populations and conservation policies often operate across
wide areas. Large-extent spatial datasets (Scheiner et al., 2000) can
therefore be extremely valuable to determine population parameters
for conservation purposes, e.g. the geographical distribution of spe-
cies, its population size or trends. However, the statistical analyses
used often ignore issues that may bias conclusions. In particular,
they rarely deal with inference problems inherent from spatial data-
sets such as residual spatial autocorrelation (hereafter RSA), which
may actually reverse observed patterns (Kühn, 2007).
Spatial autocorrelation arises when the measure of a variable of
interest in multiple sample units are not independent of each other
(Griffith, 1987), which often occurs in ecological data. Such spatial
patterns are usually explained by environmental features (e.g. climatic
variables or habitat structure) that are themselves spatially structured.
Therefore, including all environmental variables that are spatially
structured may be sufficient to remove RSA of a regression model
(Diniz-Filho et al., 2003). However, it is often impossible to measure
all spatially structured variables: for instance, variables accounting for
social behaviour or for the availability of food resources, are very diffi-
cult to measure and often miss in the dataset. In such cases, the inclu-
sion of all available variables does not fully remove RSA and the
important assumption of independence of residuals is violated (see
Dormann et al., 2007). It is well known that this problem mostly affects
the uncertainty of statistical models (Legendre, 1993; Legendre et al.,
2002), i.e. the confidence interval around the regression coefficients,
which is commonly measured by the standard error. A positive RSA, i.e.
closer locations having more similar residual values than others, tends
to underestimate the true standard errors of parameters, which lead to
an over-precise estimation of the regression coefficients. In turn this
can lead to an erroneously low p-value, wrong R
2
and wrong likelihood
(Legendre, 1993; Legendre et al., 2002; Lennon, 2000).
RSA raises two main concerns. The first relates to model selection,
since classical criterion such as the Akaïke information criterion (hereaf-
ter AIC) are biased in the presence of RSA (see Cassemiro et al., 2007;
Diniz-Filho et al., 2008; Hoeting et al., 2006). The most common strategy
Ecological Informatics 14 (2013) 17–24
Abbreviations: AIC, Akaïke Information Criterion; GLM, Generalized Linear Model;
PCA, Principal Component Analysis; RMSEP, Root Mean Squared Error of Prediction;
RSA, Residual Spatial Autocorrelation.
⁎ Corresponding author. Tel.: +33 5 49 09 35 13; fax: +33 5cbu 49 09 65 26.
E-mail addresses: lerest.k@gmail.com (K. Le Rest), pinaud@cebc.cnrs.fr (D. Pinaud),
breta@cebc.cnrs.fr (V. Bretagnolle).
1574-9541/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.ecoinf.2012.11.008
Contents lists available at SciVerse ScienceDirect
Ecological Informatics
journal homepage: www.elsevier.com/locate/ecolinf