Embedded Python: bridging the gap between domain scientists and HPC. David A. Ham 1,2 1. Department of Earth Science and Engineering 2. Grantham Institute for Climate Change Imperial College London 1 Abstract This paper advocates and demonstrates the advantages for domain scientists of embedding Python in high per- formance simulation software. Simulation software is a powerful tool but, being implemented in compiled lan- guages on performance grounds, it tends towards inflexibility and difficulty of use. In contrast, interpreted systems such as Python and Matlab are relatively easy for domain scientists to use but have limited applicability in high performance, batch processed contexts. The solution advocated here is to embed the Python language in the simulation software so that the public, configurable interface of the software benefits from the usability of that language while the computational back end remains unchanged. The embedded Python interface in the Fluidity high performance fluids package is presented as an illustration of the successful application of this approach. 2 The simulation configuration problem Typically, simulation software embodies some mathematical model of a real-world system. That model is then driven with input data and the consequences in the model system are calculated. The development of modelling software, particularly software developed in academia, typically focusses on this calculation phase which may be exceptionally computationally expensive and is frequently also mathematically complex. The calculation part of the model, which we will call the model core, is typically written in a compiled language such as C, C++ or Fortran and in many cases will be parallelised using MPI and/or OpenMP. The model core is typically written by domain scientists with computational science interests or by computational scientists in collaboration with domain scientists. However, the users of simulation software are typically domain scientists without particular computational science expertise but with complex requirements of the modelling core. Every scenario to be simulated will have different input data and different required outputs. Users frequently wish to extend the model by taking computed outputs and feeding them back into the model at runtime. This is a requirement which is hard to meet with a conventional model core. For example, in a fluid flow simulation, the initial fluid velocity might be known as a mathematical function or might be available as a set of experimental or observational data. However the model core needs to be provided with the initial velocity vector at each discrete node in the domain. For a complex simulation, this preprocessing may need to be carried out by the user for many fields and for boundary as well as initial conditions. If the desired outputs are not within the scope of the model, the situation is still worse as the user may need to actually edit the model source code to ensure the correct data is output. This may require programming skills beyond those available to the user and also carries an increased risk of the user introducing bugs which invalidate the results of the simulation. The challenge, therefore, is to provide a mechanism for providing inputs and specifying outputs which matches the information and skills available to the user with the requirements of the model core. 3 Why Python? Python is an interpreted language with a clean and clear syntax which is increasingly popular among scientists. The SciPy scientific packages (Jones et al., 2001–) provide similar facilities to Matlab and the language has be- gun to be used in textbooks on scientific computing (Langtangen, 2009, for example). Python also has a very straightforward C interface for embedding and tools to generate glue layers for C/C++ (Beazley, 1996) and For- tran (Peterson, 2009) code. Embedding Python in an application is as simple as linking against the Python library and calling suitable initialisation routines. Python is installed by default on almost all Unix machines and is available for Windows and Mac. Also of critical importance is that Python is open source. This distinguishes 1