Contents lists available at ScienceDirect Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/ceus Comparison of Iterative Proportional Fitting and Simulated Annealing as synthetic population generation techniques: Importance of the rounding method Durán-Heras Alfonso ⁎ , García-Gutiérrez Isabel, Castilla-Alcalá Guillermo School of Engineering, Universidad Carlos III de Madrid, Spain ARTICLE INFO Keywords: Synthetic population Iterative Proportional Fitting Simulated annealing Small area IPF rounding ABSTRACT Approaches to space-related problems that model decision-making and interactions at the level of individuals, and thus require disaggregated population data (i.e. specifying all attributes for each individual) are increasingly being used in various research domains. Actual population data is generally unavailable due to conﬁdentiality and cost constraints. Therefore, synthetic population generation techniques based on aggregated marginal constraints and a random sample are often used. The two sample-based techniques most frequently used are Iterative Proportional Fitting (IPF) coupled with integerization and Simulated Annealing (SA) (SA is a special case of Combinatorial Optimization, CO). Several authors have emphasized the need for further research on comparing their relative performance. Thus, a methodology encompassing statistical analysis to compare IPF and SA is presented here. Technique performance is evaluated through the percentage classiﬁcation error of the generated population against the reference population. Two cases are analyzed using the 2001 census microdata in Andalusia (Spain) and the 2000 Swiss Public Use Sample as reference populations, encompassing 6 socio- demographic attributes plus geographic location (municipalities and cantons). Aggregated marginal constraints and random samples are calculated from the reference population. A set of synthetic small area populations are generated using both techniques for various scenarios within each case, corresponding to diﬀerent combinations of sample sizes, number of categories and number of generated populations. Results reveal the great importance of the integerization process applied to IPF's output. IPF coupled with a marginal distributions-controlled rounding outperforms populations generated with SA in all scenarios, while as SA generally outperforms IPF coupled with the commonly used Monte Carlo rounding. 1. Introduction There is a growing body of literature on approaches to space-related problems that model decision-making and interactions at the level of individuals, such as spatial microsimulation (Spatial microsimulation can be deﬁned as “… an approach to the analysis of individual-level phenomena over geographical space that involves the creation, analysis and modelling of spatial microdata” (Lovelace & Dumont, 2016)) and agent-based simulation models, and thus rely on population microdata. Applications can be found in a wide variety of ﬁelds: transportation planning using travel demand models (Frick & Axhausen, 2004; MATSim, 2017; TRANSIMS, 2017); study of environmental problems linked to gas emissions in cities (Ma, Heppenstall, Harland, & Mitchell, 2014); population evolution used for demographic forecasting (Wu, Birkin, & Rees, 2008); healthcare regional planning (Morrissey, Clarke, Ballas, Hynes, & O'Donoghue, 2008); and numerous other ﬁelds, such as marketing (Hanaoka & Clarke, 2007), tourism (Van Leeuwen & Nijkamp, 2010), urban planning (Marois & Bélanger, 2015), crimin- ology (Malleson & Birkin, 2012) or mobility (Lenormand, Huet, & Gargiulo, 2014). Ballas, Rossiter, Thomas, Clarke, and Dorling (2005), Birkin and Clarke (2011) and Ye, Wang, Chen, Lin, and Wang (2016) provide general reviews of microsimulation and its applications. These studies are typically carried out at the spatial scale level of munici- palities or small urban areas such as wards or districts. These approaches require populations of individuals (“agents”), such as households, families or individuals, each of which is char- acterized by the speciﬁc values assigned to a set of relevant, correlated spatial and socio-economic attributes (Farooq, Bierlaire, Hurtubia, & Flötteröd, 2013). Synthetically generated populations are generally utilized, since comprehensive, fully disaggregated data is rarely avail- able (e.g., due to privacy issues in census-based data and due to sample size limitations in survey-based analysis) (Cho et al., 2014). A synthetic https://doi.org/10.1016/j.compenvurbsys.2017.11.001 Received 22 March 2017; Received in revised form 6 November 2017; Accepted 8 November 2017 ⁎ Corresponding author at: School of Engineering, Universidad Carlos III de Madrid, Avda. Universidad, 30, 28911, Leganés, Madrid, Spain. E-mail address: duran@ing.uc3m.es (A. Durán-Heras). Computers, Environment and Urban Systems xxx (xxxx) xxx–xxx 0198-9715/ © 2017 Elsevier Ltd. All rights reserved. Please cite this article as: Durán, A., Computers, Environment and Urban Systems (2017), https://doi.org/10.1016/j.compenvurbsys.2017.11.001