Landscape and Urban Planning 107 (2012) 293–306
Contents lists available at SciVerse ScienceDirect
Landscape and Urban Planning
jou rn al h om epa ge: www.elsevier.com/locate/landurbplan
Variable selection for hedonic model using machine learning approaches: A case
study in Onondaga County, NY
Sanglim Yoo
a,∗
, Jungho Im.
a,b,1,2
, John E. Wagner
a,3
a
College of Environmental Science and Forestry, State University of New York, Syracuse, NY 13210-2778, USA
b
School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 689-798, South Korea
h i g h l i g h t s
◮ Application of machine learning regression methods to hedonic price function to select variables.
◮ Comparison of selection results of machine learning methods with traditional ordinary least squares method.
◮ Propose more practical approaches for the selection of important variables for hedonic price function.
a r t i c l e i n f o
Article history:
Received 22 September 2011
Received in revised form 5 June 2012
Accepted 7 June 2012
Available online 3 July 2012
Keywords:
Hedonic model
Variable selection
Machine learning
Cubist
Random Forest
Environmental amenities
a b s t r a c t
Based on the theoretical foundation of hedonic methods, positive relationships between various types
of environmental amenities and house sales price have been investigated. However, as hedonic theory
does not provide any arguments in favor of specific sets of independent variables, this lack of theoretical
support led researchers to select independent variables from empirical results and intuitive information
of previous studies. In previous hedonic studies, the most widely used selection criterion was stepwise
selection for multiple regression with ordinary least square (OLS) regression for model fitting. The objec-
tive of this study is to apply machine learning approaches to the hedonic variable selection and house sales
price modeling. Two rule-based machine learning regression methods including Cubist and Random For-
est (RF) were compared with the traditional OLS regression for hedonic modeling. Each regression method
was applied to analyze 4469 house transaction data from Onondaga County, NY (USA) with two different
neighborhood configurations (i.e., 100 m and 1 km radius buffers). Results showed that the RF resulted
in the highest accuracy in terms of hedonic price modeling followed by Cubist and the traditional OLS
method. Each regression method selected different sets of environmental variables for different neigh-
borhood. Since the variables selected by RF method led to make an in-depth hypothesis reflecting the
preferences of house buyers, RF may prove to be useful for important variable selection for the hedonic
price equation as well as enhancing model performance.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
A major purpose of modern urban planning is the orderly
arrangement of parts of the city, so that each part could perform
its functions with minimum economic cost and conflicts. In urban
area, the intense demand for the services that are provided by envi-
ronmental amenities is much higher than rural or suburban areas.
Therefore, the issue of measuring the demand for environmental
∗
Corresponding author. Tel.: +1 315 430 8209; fax: +1 315 470 6535.
E-mail addresses: sayoo@syr.edu (S. Yoo), ersgis@unist.ac.kr, imj@esf.edu
(J. Im.), jewagner@esf.edu (J.E. Wagner).
1
Tel.: +82 52 217 2824.
2
Tel.: +1 315 470 4709.
3
Tel.: +1 315 470 6971.
amenities has attracted attention from policy decision makers.
Specifically, in terms of open space, the questions of what kind of
environmental amenities they provide, and how to measure and
estimate economic values of these amenities have become a major
concern.
The first question has been primarily investigated by the disci-
pline of ecology, while the second and third questions have been
discussed by the discipline of economics. Economists have applied
various methodologies for estimating economic values of and mea-
suring amenities provided by open space. One of the traditional
ways to answer these questions is by looking for clues in related
property values. The use of property value differentials arising
from the heterogeneity around each property is called the hedo-
nic property method. Applied to open space valuation, this method
measures the increases in values of houses in the neighborhoods
nearby open space parcels (Loomis, Rameker, & Seidl, 2004).
0169-2046/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.landurbplan.2012.06.009