Using Machine Learning to Emulate Agent-Based Simulations Claudio Angione 1,3,4,,* , Eric Silverman 2, , Elisabeth Yaneske 1, , 1 School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK 2 Institute for Health and Wellbeing, University of Glasgow, Glasgow, UK 3 Healthcare Innovation Centre, Teesside University, Middlesbrough, UK 4 Centre for Digital Innovation, Teesside University, Middlesbrough, UK These authors contributed equally to this work. *c.angione@tees.ac.uk Abstract In this proof-of-concept work, we evaluate the performance of multiple machine-learning methods as statistical emulators for use in the analysis of agent-based models (ABMs). Analysing ABM outputs can be challenging, as the relationships between input parameters can be non-linear or even chaotic even in relatively simple models, and each model run can require significant CPU time. Statistical emulation, in which a statistical model of the ABM is constructed to facilitate detailed model analyses, has been proposed as an alternative to computationally costly Monte Carlo methods. Here we compare multiple machine-learning methods for ABM emulation in order to determine the approaches best suited to emulating the complex behaviour of ABMs. Our results suggest that, in most scenarios, artificial neural networks (ANNs) and gradient-boosted trees outperform Gaussian process emulators, currently the most commonly used method for the emulation of complex computational models. ANNs produced the most accurate model replications in scenarios with high numbers of model runs, although training times were longer than the other methods. We propose that agent-based modelling would benefit from using machine-learning methods for emulation, as this can facilitate more robust sensitivity analyses for the models while also reducing CPU time consumption when calibrating and analysing the simulation. Introduction In this paper, we investigate the use of machine-learning-based surrogate modelling for the analysis of agent-based models (ABMs). In this approach, machine-learning methods are used to generate statistical models that replicate the behaviour of the original ABM to a high degree of accuracy; these surrogates are substantially faster to run than the original model, enabling complex sensitivity analyses to be performed much more efficiently. This proof-of-concept work demonstrates that these methods are applicable and useful even in time- and resource-limited modelling contexts, and that these surrogates are capable of closely replicating the behaviour of the original model even when minimal hyperparameter optimisation is performed. We propose that incorporating such methods into standard ABM practice may allow a significant improvement in the standard of results reporting in certain disciplines, particularly in July 27, 2021 1/23 arXiv:2005.02077v2 [cs.MA] 24 Jul 2021