1 Paper 492-2013 The Value of Neighborhood Information in Prospect Selection Models: Investigating the Optimal Level of Granularity Philippe Baecke Area Marketing, Vlerick Business School, Belgium Dirk Van den Poel Faculty of Economics and Business Administration, Department of Marketing, Ghent University, Belgium ABSTRACT Within analytical customer relationship management (CRM), customer acquisition models suffer the most from a lack of data quality because the information of potential customers is mostly limited to socio-demographic and lifestyle variables obtained from external data vendors. Particularly in this situation, taking advantage of the spatial correlation between customers can improve the predictive performance of these models. This study compares the predictive performance of an autoregressive and hierarchical technique in an application that identifies potential new customers for 25 products and brands. In addition, this study shows that the predictive improvement can vary significantly depending on the granularity level on which the neighborhoods are composed. Therefore, a model is introduced that simultaneously incorporates multiple levels of granularity resulting in even more accurate predictions. 1. INTRODUCTION As markets become increasingly saturated and highly competitive, companies have shifted their marketing strategies from transactional marketing to relationship marketing (Coussement et al. , 2010; Pai & Tu, 2011). This is reflected in an explosion of interest in customer relationship management (CRM) by both academics and business practitioners (Ngai et al., 2009). Due to the information revolution and the drop in costs of data warehousing, many companies have collected a vast amount of socio-demographic and transactional data of their customers. In addition, computer power is increasing rapidly and data mining techniques are used to exploit this data in an optimal manner (Hosseini et al., 2010; Kamakura et al., 2005). This has resulted in the development of a wide range of software tools which enable companies to transform the collected data into useful information for marketing decision makers. Besides the data mining technique, the success of a CRM model also depends on the quality of the information used as input for the model (Baecke & Van den Poel, 2011). Traditional CRM models often ignore neighborhood information and rely on the assumption of independent observations. This means that customers’ purchasing behavior is totally unrelated to the behavior of others. However, in reality, customer preferences do not only depend on their own characteristics, but are often also related to the behavior of other customers in their neighborhood. Using neighborhood information to incorporate spatial autocorrelation in the model can solve this shortcoming and significantly improve the predictive performance of the model. From all CRM fields, it is often most difficult to obtain good predictive results in the case of customer acquisition. This is because obtaining information from potential customers is not straightforward (Thorleuchter et al., 2012). As a result, in order to identify possible prospects, acquisition models are often estimated only based on a limited number of variables obtained from external data vendors (Baecke & Van den Poel, 2011). Especially in such a context where the availability of data is limited, incorporating neighborhood effects can be very valuable. In academic literature, there are two important studies that specifically focus on the incorporation of spatial interdependence in order to improve customer identification, each using a different predictive technique. On the one hand, Yang & Allenby (2003) used an autoregressive approach to incorporate both geographic and demographic proximity between customers in a CRM model that predicts customers’ preference for Japanese-made cars. That study indicated that geographic reference groups still have a larger impact than demographic reference groups. On the other hand, Steenburgh et al. (2003) used a hierarchical model to include a massively categorical variable, such as zip-codes, in order to improve the acquisition of new students at a private university. This paper contributes to previous literature by comparing the predictive performance of these two predictive techniques across multiple product categories. Statistics and Data Analysis SAS Global Forum 2013