Evaluating geo-environmental variables using a clustering based areal model Bulent Tutmez a,n , Uzay Kaymak b , A. Erhan Tercan c , Christopher D. Lloyd d a Department of Mining Engineering, Inonu University, Malatya 44280, Turkey b School of Industrial Engineering, Eindhoven University of Technology, P.O. Box 513 5600 MB, Eindhoven, The Netherlands c Department of Mining Engineering, Hacettepe University, Ankara 06800, Turkey d School of Geography, Archaeology and Palaeoecology, Queen’s University, Belfast, UK article info Article history: Received 9 November 2011 Received in revised form 18 February 2012 Accepted 21 February 2012 Available online 1 March 2012 Keywords: Spatial relationship GWR Fuzzy clustering Local analysis Geo-environmental abstract Global regression models do not accurately reflect the spatial heterogeneity which characterises most geo-environmental variables. In analysing the relationships between such variables, an approach is required which allows the model parameters to vary spatially. This paper proposes a new framework for exploring local relationships between geo-environmental variables. The method is based on extended objective function based fuzzy clustering with the environmental parameters estimated through on a locally weighted regression analysis. The case studies and prediction evaluations show that the fuzzy algorithm yields well-fitted models and accurate predictions. In addition to an increased accuracy of prediction relative to the widely-used geographically weighted regression (GWR), the proposed algorithm provides the search radius (bandwidth) and weights for local estimation directly from the data. The results suggest that the method could be employed effectively in tackling real world kernel-based modelling problems. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction The environmental and geological sciences deal with the spatial behaviours of natural phenomena. Geo-environmental data feature complex spatial pattern at different level scales owing to a combination of several spatial phenomena or various influencing factors of different origins (Kanevski et al., 2004). Uncertainty and irregularity are common properties of measurements of these phenomena. In many cases, it is common to assume that measure- ments are independent and identically distributed, but this may not be the case when working with spatial data (Cressie, 1993; Bivand et al., 2008). Spatially varying relationships between environmental variables are common and are a result of complex processes. These spatial relationships can be evaluated in different approaches by statistical models. Recently, a variety of models have been proposed to explore variations in relationships between variables (Schabenberger and Gotway, 2005; Gao et al., 2006). The exploration of relationships between multiple variables is often approached in a regression framework. Multiple linear regression analysis allows the assessment of the strength and nature of the relationship between one variable and a set of independent variables, and such approaches are very widely used in the analysis of environmental variables, and more generally throughout the physical and social sciences. However, limitations of the standard approaches have led to the development of more robust methods which are appropriate in particular contexts. In cases where the variables are spatially referenced, a standard ordinary least square approach may not be suitable because positive spatial autocorrelation means that the assumption of independence of the samples is violated. Alternative approaches such as generalised least squares exist which account for the spatial structure in variables (Lloyd, 2006). Spatial measures cover both attribute and location informa- tion. Areal analyses concentrate on differences across space whereas global analyses concentrate on similarities across space owing to their nature. Areal identification has been used widely in many disciplines such as image processing (local filters) for several decades. However, in some disciplines, such as the geosciences, the environmental sciences, ecology and geography, a motivation on approaches that account for areal variation has been a comparatively recent development. For the purposes of prediction of a dependent variable given a set of independent variables, local regression approaches can offer considerable benefits in terms of an increase in prediction accuracy over standard global models (Lloyd, 2006). Geographically Weighted Regression (GWR) is one approach which is being used increas- ingly widely to explore areal spatial variations in relationships (Fotheringham et al., 1998). It is an adaptive and effective method for modelling relationships locally by calibrating a spatially varying coefficient regression model. Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences 0098-3004/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2012.02.019 n Corresponding author. Tel.: þ90 422 3774773; fax: þ90 422 3410046. E-mail address: bulent.tutmez@inonu.edu.tr (B. Tutmez). Computers & Geosciences 43 (2012) 34–41