Landscape and Urban Planning 107 (2012) 293–306 Contents lists available at SciVerse ScienceDirect Landscape and Urban Planning jou rn al h om epa ge: www.elsevier.com/locate/landurbplan Variable selection for hedonic model using machine learning approaches: A case study in Onondaga County, NY Sanglim Yoo a, , Jungho Im. a,b,1,2 , John E. Wagner a,3 a College of Environmental Science and Forestry, State University of New York, Syracuse, NY 13210-2778, USA b School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 689-798, South Korea h i g h l i g h t s Application of machine learning regression methods to hedonic price function to select variables. Comparison of selection results of machine learning methods with traditional ordinary least squares method. Propose more practical approaches for the selection of important variables for hedonic price function. a r t i c l e i n f o Article history: Received 22 September 2011 Received in revised form 5 June 2012 Accepted 7 June 2012 Available online 3 July 2012 Keywords: Hedonic model Variable selection Machine learning Cubist Random Forest Environmental amenities a b s t r a c t Based on the theoretical foundation of hedonic methods, positive relationships between various types of environmental amenities and house sales price have been investigated. However, as hedonic theory does not provide any arguments in favor of specific sets of independent variables, this lack of theoretical support led researchers to select independent variables from empirical results and intuitive information of previous studies. In previous hedonic studies, the most widely used selection criterion was stepwise selection for multiple regression with ordinary least square (OLS) regression for model fitting. The objec- tive of this study is to apply machine learning approaches to the hedonic variable selection and house sales price modeling. Two rule-based machine learning regression methods including Cubist and Random For- est (RF) were compared with the traditional OLS regression for hedonic modeling. Each regression method was applied to analyze 4469 house transaction data from Onondaga County, NY (USA) with two different neighborhood configurations (i.e., 100 m and 1 km radius buffers). Results showed that the RF resulted in the highest accuracy in terms of hedonic price modeling followed by Cubist and the traditional OLS method. Each regression method selected different sets of environmental variables for different neigh- borhood. Since the variables selected by RF method led to make an in-depth hypothesis reflecting the preferences of house buyers, RF may prove to be useful for important variable selection for the hedonic price equation as well as enhancing model performance. © 2012 Elsevier B.V. All rights reserved. 1. Introduction A major purpose of modern urban planning is the orderly arrangement of parts of the city, so that each part could perform its functions with minimum economic cost and conflicts. In urban area, the intense demand for the services that are provided by envi- ronmental amenities is much higher than rural or suburban areas. Therefore, the issue of measuring the demand for environmental Corresponding author. Tel.: +1 315 430 8209; fax: +1 315 470 6535. E-mail addresses: sayoo@syr.edu (S. Yoo), ersgis@unist.ac.kr, imj@esf.edu (J. Im.), jewagner@esf.edu (J.E. Wagner). 1 Tel.: +82 52 217 2824. 2 Tel.: +1 315 470 4709. 3 Tel.: +1 315 470 6971. amenities has attracted attention from policy decision makers. Specifically, in terms of open space, the questions of what kind of environmental amenities they provide, and how to measure and estimate economic values of these amenities have become a major concern. The first question has been primarily investigated by the disci- pline of ecology, while the second and third questions have been discussed by the discipline of economics. Economists have applied various methodologies for estimating economic values of and mea- suring amenities provided by open space. One of the traditional ways to answer these questions is by looking for clues in related property values. The use of property value differentials arising from the heterogeneity around each property is called the hedo- nic property method. Applied to open space valuation, this method measures the increases in values of houses in the neighborhoods nearby open space parcels (Loomis, Rameker, & Seidl, 2004). 0169-2046/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.landurbplan.2012.06.009