Contents lists available at ScienceDirect
Computers, Environment and Urban Systems
journal homepage: www.elsevier.com/locate/ceus
Comparison of Iterative Proportional Fitting and Simulated Annealing as
synthetic population generation techniques: Importance of the rounding
method
Durán-Heras Alfonso
⁎
, García-Gutiérrez Isabel, Castilla-Alcalá Guillermo
School of Engineering, Universidad Carlos III de Madrid, Spain
ARTICLE INFO
Keywords:
Synthetic population
Iterative Proportional Fitting
Simulated annealing
Small area
IPF rounding
ABSTRACT
Approaches to space-related problems that model decision-making and interactions at the level of individuals,
and thus require disaggregated population data (i.e. specifying all attributes for each individual) are increasingly
being used in various research domains. Actual population data is generally unavailable due to confidentiality
and cost constraints. Therefore, synthetic population generation techniques based on aggregated marginal
constraints and a random sample are often used. The two sample-based techniques most frequently used are
Iterative Proportional Fitting (IPF) coupled with integerization and Simulated Annealing (SA) (SA is a special
case of Combinatorial Optimization, CO). Several authors have emphasized the need for further research on
comparing their relative performance. Thus, a methodology encompassing statistical analysis to compare IPF
and SA is presented here. Technique performance is evaluated through the percentage classification error of the
generated population against the reference population. Two cases are analyzed using the 2001 census microdata
in Andalusia (Spain) and the 2000 Swiss Public Use Sample as reference populations, encompassing 6 socio-
demographic attributes plus geographic location (municipalities and cantons). Aggregated marginal constraints
and random samples are calculated from the reference population. A set of synthetic small area populations are
generated using both techniques for various scenarios within each case, corresponding to different combinations
of sample sizes, number of categories and number of generated populations. Results reveal the great importance
of the integerization process applied to IPF's output. IPF coupled with a marginal distributions-controlled
rounding outperforms populations generated with SA in all scenarios, while as SA generally outperforms IPF
coupled with the commonly used Monte Carlo rounding.
1. Introduction
There is a growing body of literature on approaches to space-related
problems that model decision-making and interactions at the level of
individuals, such as spatial microsimulation (Spatial microsimulation
can be defined as “… an approach to the analysis of individual-level
phenomena over geographical space that involves the creation, analysis
and modelling of spatial microdata” (Lovelace & Dumont, 2016)) and
agent-based simulation models, and thus rely on population microdata.
Applications can be found in a wide variety of fields: transportation
planning using travel demand models (Frick & Axhausen, 2004;
MATSim, 2017; TRANSIMS, 2017); study of environmental problems
linked to gas emissions in cities (Ma, Heppenstall, Harland, & Mitchell,
2014); population evolution used for demographic forecasting (Wu,
Birkin, & Rees, 2008); healthcare regional planning (Morrissey, Clarke,
Ballas, Hynes, & O'Donoghue, 2008); and numerous other fields, such as
marketing (Hanaoka & Clarke, 2007), tourism (Van Leeuwen &
Nijkamp, 2010), urban planning (Marois & Bélanger, 2015), crimin-
ology (Malleson & Birkin, 2012) or mobility (Lenormand, Huet, &
Gargiulo, 2014). Ballas, Rossiter, Thomas, Clarke, and Dorling (2005),
Birkin and Clarke (2011) and Ye, Wang, Chen, Lin, and Wang (2016)
provide general reviews of microsimulation and its applications. These
studies are typically carried out at the spatial scale level of munici-
palities or small urban areas such as wards or districts.
These approaches require populations of individuals (“agents”),
such as households, families or individuals, each of which is char-
acterized by the specific values assigned to a set of relevant, correlated
spatial and socio-economic attributes (Farooq, Bierlaire, Hurtubia, &
Flötteröd, 2013). Synthetically generated populations are generally
utilized, since comprehensive, fully disaggregated data is rarely avail-
able (e.g., due to privacy issues in census-based data and due to sample
size limitations in survey-based analysis) (Cho et al., 2014). A synthetic
https://doi.org/10.1016/j.compenvurbsys.2017.11.001
Received 22 March 2017; Received in revised form 6 November 2017; Accepted 8 November 2017
⁎
Corresponding author at: School of Engineering, Universidad Carlos III de Madrid, Avda. Universidad, 30, 28911, Leganés, Madrid, Spain.
E-mail address: duran@ing.uc3m.es (A. Durán-Heras).
Computers, Environment and Urban Systems xxx (xxxx) xxx–xxx
0198-9715/ © 2017 Elsevier Ltd. All rights reserved.
Please cite this article as: Durán, A., Computers, Environment and Urban Systems (2017),
https://doi.org/10.1016/j.compenvurbsys.2017.11.001