European Journal of Operational Research 249 (2016) 517–524 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Spatial dependence in credit risk and its improvement in credit scoring Guilherme Barreto Fernandes a,b, , Rinaldo Artes a a Insper Institute of Education and Research, Rua Quatá, 300, Vila Olímpia, São Paulo, Brazil b Serasa Experian, Alameda dos Quinimuras, 187, Planalto Paulista, CEP 04068-900 São Paulo, Brazil article info Article history: Received 15 March 2014 Accepted 6 July 2015 Available online 29 July 2015 Keywords: Risk analysis Spatial dependence SME credit risk Ordinary kriging Credit scoring abstract Credit scoring models are important tools in the credit granting process. These models measure the credit risk of a prospective client based on idiosyncratic variables and macroeconomic factors. However, small and medium sized enterprises (SMEs) are subject to the effects of the local economy. From a data set with the localization and default information of 9 million Brazilian SMEs, provided by Serasa Experian (the largest Brazilian credit bureau), we propose a measure of the local risk of default based on the application of ordinary kriging. This variable has been included in logistic credit scoring models as an explanatory variable. These models have shown better performance when compared to models without this variable. A gain around 7 percentage points of KS and Gini was observed. © 2015 Elsevier B.V. and Association of European Operational Research Societies (EURO) within the International Federation of Operational Research Societies (IFORS). All rights reserved. 1. Introduction The correct evaluation of credit risk is an important issue of the Basel agreements. In this context, the probability of default (PD) has a central role. Statistical and mathematical models have been widely employed in order to estimate the PD for companies or contracts. These models, called credit scoring models usually determine the risk of default conditionally to exogenous factors. The Basel agreements require conservative estimates of PD for loan portfolios, and retail customers – such as small and medium sized enterprises (SMEs) – must be addressed under the perspective of a massive risk evaluation by means of statistical models. In the present paper logistic models (Hosmer & Lemeshow, 2000) will be used to predict the PD of SMEs. Information on payment history and financial capacity are natu- rally understood as relevant risk factors in these models. It also seems to be reasonable to assume that the firm location adds information to credit scoring models, particularly to those aimed to predict default risk of SMEs. Oftentimes the main customers of these firms are the population and other companies located in the region where they operate. Thus, when considering an SME located in a region that is facing an economic downturn, affecting the performance of nearby businesses, the risk of default of this firm is expected to increase. In principle, the need of the inclusion of a spatial factor in credit scoring models could be replaced by characteristics of the local econ- omy. However, information gathering is very difficult when the area Corresponding author at: Insper and Serasa Experian Alameda dos Quinimuras, 187 Analytics São Paulo, SP - Brazil. Tel.: +55 11 98642 8017; fax: +55 11 3805 4168. E-mail address: gbfernandes2002@gmail.com (G.B. Fernandes). of investigation is big – once information on small localities in those regions can be rather scarce or unavailable. Similar problems were verified by Gerkman (2011) in a study of real estate prices. In this context, the analysis of spatial dependence is justified in a comprehensive study on the credit risk of SMEs; few studies on credit scoring, however, consider this effect. The aim of this paper is to in- corporate information on default spatial behavior into credit scoring models for SMEs. The use of an independent ZIP code related variable is a classical alternative to introduce spatial information into credit scoring mod- els. However, it is a qualitative variable with potentially large number of categories, which produces a non-parsimonious model and brings the risk of a multicollinearity problem. Moreover, regions with few individuals would not have good risk assessment. The large num- ber of ZIP-code categories can produce an overfitting effect and may make the model unstable over time. Finally, economic phenomena do not necessarily respect this territorial division. In this paper, the spatial dependence is considered by the inclu- sion of a quantitative variable in the model – which may be con- sidered a measure of spatial risk of default – obtained by ordinary kriging (Matheron, 1963). This risk factor is used as an explanatory variable in logistic credit scoring models. Two different alternatives for the inclusion of this factor in the logistic model have been consid- ered. The first, and simplest one, is to consider it as a fixed variable (without measurement error). The other is to admit that the observed value, ˆ Z, is, in fact, a proxy of an unobservable variable that expresses the spatial risk factor (τ ) such that ˆ Z = τ + ε, where ε is a random error of measurement (logistic model with errors in variables) (Clark, 1982). http://dx.doi.org/10.1016/j.ejor.2015.07.013 0377-2217/© 2015 Elsevier B.V. and Association of European Operational Research Societies (EURO) within the International Federation of Operational Research Societies (IFORS). All rights reserved.