Journal of Machine Learning Research 10 (2009) 2039-2078 Submitted 12/08; Revised 3/09; Published 9/09 Evolutionary Model Type Selection for Global Surrogate Modeling Dirk Gorissen DIRK. GORISSEN@UGENT. BE Tom Dhaene TOM. DHAENE@UGENT. BE Filip De Turck FILIP. DETURCK@UGENT. BE Ghent University - IBBT Department of Information Technology (INTEC) Gaston Crommenlaan 8 bus 201 9050 Gent, Belgium Editor: Melanie Mitchell Abstract Due to the scale and computational complexity of currently used simulation codes, global surrogate (metamodels) models have become indispensable tools for exploring and understanding the design space. Due to their compact formulation they are cheap to evaluate and thus readily facilitate visual- ization, design space exploration, rapid prototyping, and sensitivity analysis. They can also be used as accurate building blocks in design packages or larger simulation environments. Consequently, there is great interest in techniques that facilitate the construction of such approximation models while minimizing the computational cost and maximizing model accuracy. Many surrogate model types exist (Support Vector Machines, Kriging, Neural Networks, etc.) but no type is optimal in all circumstances. Nor is there any hard theory available that can help make this choice. In this paper we present an automatic approach to the model type selection problem. We describe an adaptive global surrogate modeling environment with adaptive sampling, driven by speciated evolution. Dif- ferent model types are evolved cooperatively using a Genetic Algorithm (heterogeneous evolution) and compete to approximate the iteratively selected data. In this way the optimal model type and complexity for a given data set or simulation code can be dynamically determined. Its utility and performance is demonstrated on a number of problems where it outperforms traditional sequential execution of each model type. Keywords: model type selection, genetic algorithms, global surrogate modeling, function approx- imation, active learning, adaptive sampling 1. Introduction For many problems from science and engineering it is impractical to perform experiments on the physical world directly (e.g., airfoil design, earthquake propagation). Instead, complex, physics- based simulation codes are used to run experiments on computer hardware. While allowing scien- tists more flexibility to study phenomena under controlled conditions, computer experiments require a substantial investment of computation time. One simulation may take many minutes, hours, days or even weeks. A simpler approximation of the simulator is needed to make sensitivity analysis, visualization, design space exploration, etc. feasible (Forrester et al., 2008; Simpson et al., 2008). As a result researchers have turned to various approximation methods that mimic the behavior of the simulation model as closely as possible while being computationally cheap(er) to evaluate. Different types of approximation methods exist, each with their relative strengths. This work con- c 2009 Gorissen, Dhaene and De Turck.