Accounting for spatial autocorrelation from model selection to statistical inference: Application to a national survey of a diurnal raptor Kévin Le Rest , David Pinaud, Vincent Bretagnolle Centre d'Etudes Biologiques de Chizé (CEBC), CNRS UPR 1934, 79360 Beauvoir-Sur-Niort, France abstract article info Article history: Received 26 January 2012 Accepted 30 November 2012 Available online 12 December 2012 Keywords: Generalized Linear Models Spatial cross-validation Population size Residual spatial autocorrelation Spatial ltering Species distribution Planning actions for species conservation involves working at both an ecologically meaningful spatial scale and a scale suitable for implementing management or conservation plans. Animal populations and conserva- tion policies often operate across wide areas. Large-extent spatial datasets are thus often used, but their anal- yses rarely deal with problems inherent to spatial datasets such as residual spatial autocorrelation, which can bias or even reverse results. Here we propose a procedure for analysing a large-scale count dataset integrat- ing residual spatial autocorrelation in a Generalized Linear Model framework by combining and extending previously published methods. The rst step concerns the selection of the environmental variables by a mod- ied cross-validation procedure allowing for residual spatial autocorrelation. Then the second step consists in evaluating the spatial effect of the model using a spatial ltering approach based on the variogram parame- ters. We apply this method to the Black kite (Milvus migrans) to estimate the distribution and population size of this species in France. We found some divergence in estimated population size between spatial and non spatial models, as well as in the distribution map. We also found that the uncertainty of the model was underestimated by the residual spatial autocorrelation. Our analysis conrms previous results, that residual spatial autocorrelation should be always accounted for, especially in conservation where false results may lead to poor management decisions. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Animal populations and conservation policies often operate across wide areas. Large-extent spatial datasets (Scheiner et al., 2000) can therefore be extremely valuable to determine population parameters for conservation purposes, e.g. the geographical distribution of spe- cies, its population size or trends. However, the statistical analyses used often ignore issues that may bias conclusions. In particular, they rarely deal with inference problems inherent from spatial data- sets such as residual spatial autocorrelation (hereafter RSA), which may actually reverse observed patterns (Kühn, 2007). Spatial autocorrelation arises when the measure of a variable of interest in multiple sample units are not independent of each other (Grifth, 1987), which often occurs in ecological data. Such spatial patterns are usually explained by environmental features (e.g. climatic variables or habitat structure) that are themselves spatially structured. Therefore, including all environmental variables that are spatially structured may be sufcient to remove RSA of a regression model (Diniz-Filho et al., 2003). However, it is often impossible to measure all spatially structured variables: for instance, variables accounting for social behaviour or for the availability of food resources, are very dif- cult to measure and often miss in the dataset. In such cases, the inclu- sion of all available variables does not fully remove RSA and the important assumption of independence of residuals is violated (see Dormann et al., 2007). It is well known that this problem mostly affects the uncertainty of statistical models (Legendre, 1993; Legendre et al., 2002), i.e. the condence interval around the regression coefcients, which is commonly measured by the standard error. A positive RSA, i.e. closer locations having more similar residual values than others, tends to underestimate the true standard errors of parameters, which lead to an over-precise estimation of the regression coefcients. In turn this can lead to an erroneously low p-value, wrong R 2 and wrong likelihood (Legendre, 1993; Legendre et al., 2002; Lennon, 2000). RSA raises two main concerns. The rst relates to model selection, since classical criterion such as the Akaïke information criterion (hereaf- ter AIC) are biased in the presence of RSA (see Cassemiro et al., 2007; Diniz-Filho et al., 2008; Hoeting et al., 2006). The most common strategy Ecological Informatics 14 (2013) 1724 Abbreviations: AIC, Akaïke Information Criterion; GLM, Generalized Linear Model; PCA, Principal Component Analysis; RMSEP, Root Mean Squared Error of Prediction; RSA, Residual Spatial Autocorrelation. Corresponding author. Tel.: +33 5 49 09 35 13; fax: +33 5cbu 49 09 65 26. E-mail addresses: lerest.k@gmail.com (K. Le Rest), pinaud@cebc.cnrs.fr (D. Pinaud), breta@cebc.cnrs.fr (V. Bretagnolle). 1574-9541/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ecoinf.2012.11.008 Contents lists available at SciVerse ScienceDirect Ecological Informatics journal homepage: www.elsevier.com/locate/ecolinf