Volume XXX (2018), Number XXX pp. 1–14 COMPUTER GRAPHICS forum Making Sense of Scientific Simulation Ensembles with Semantic Interaction M. Dahshan 1 , N. F. Polys 1 , R. S. Jayne 2 , and R. M. Pollyea 2 1 Department of Computer Science, Virginia Tech, USA 2 Department of Geosciences, Virginia Tech, USA Abstract In the study of complex physical systems, scientists use simulations to study the effects of different models and parameters. As they seek to understand the influence and relationships among multiple dimensions, they typically run many simulations and vary the initial conditions in what are known as 'ensembles'. Ensembles are then a number of runs that are each multidimensional and multivariate. In order to understand the connections between simulation parameters and patterns in the output data, we have been developing an approach to the visual analysis of scientific data that merges human expertise and intuition with machine learning and statistics. Our approach is manifested in a new visualization tool, GLEE (Graphically-Linked Ensemble Explorer), that allows scientists to explore, search, filter, and make sense of their ensembles. Our tool uses visualization and semantic interaction techniques to enable scientists to: find similarities and differences between runs, find correlation between different parameters, and explore relations and correlations between different runs and parameters. Our approach supports scientists in selecting interesting subsets of runs to investigate and summarizing factors and statistics to show variations and consistencies across different runs. In this paper, we evaluate our tool with experts to understand its strengths and weaknesses for optimization and inverse problems. CCS Concepts Scientific Visualization Ensembles , Sensemaking; 1. Introduction Recent advances in computing power and the availability of high- performance computing have led to the feasibility of running com- plex real-world simulations in an acceptable amount of time. Sci- entists usually need to run their simulations multiple times using different input conditions, simulation parameters, and simulation models. This supports the scientist in interpreting the variability in the system and gaining insights by alternating between models. Through these multiple runs, they can gain a more complete under- standing of the simulated phenomenon and model, and refine their hypothesis and method for actual physical experiments. A set of simulation runs is known as an ensemble: it represents a param- eter study or a set of studies using different computational mod- els and paramters. Scientists from a variety of disciplines, such as aerodynamics, weather forecast climate, and computational fluid dynamics, use ensembles to simulate complex systems, explore unknowns in initial conditions, evaluate extreme cases, compare structural characteristics of their models, and investigate parameter sensitivity to assess the confidence in their findings. In other words, this guides the scientist in interpreting the distributions within the data, investigating the sensitivity of outputs to certain input param- eters and understanding the similarities and dissimilarities between ensemble members. The analysis of ensemble data is a challenging task due to its high multidimensionality, complexity, and size. Therefore, ensem- ble visualization is a crucial and essential component in the analysis process as it facilitates knowledge discoveries and helps the scien- tist see the characteristic features of the data through graphical rep- resentations. Such analysis of ensembles can help them find appro- priate models and parameter ranges for hypothesized relationships and outcomes. Moreover, ensemble visualization helps in measur- ing the variability and sensitively of the model to its inputs and out- puts and how output parameters react to input changes. Therefore, the focus of this paper is the visual exploration and comparison of the behaviors of simulations and their parameters. Current research in the visual analysis of ensembles relies on multiple techniques for showing the variability of the ensem- ble members, major trends, and outliers. Some of these tech- niques focus on studying the parameter space and measuring the correlation between different parameters. Summary statis- tics [PKRJ10, BPFG11, PWB * 09a, MWK14, WMK13, SEG * 15], spaghetti plots [DNCP10, Det05], and probabilistic features such as multivariate Gaussian distributions, histograms and kernel den- sity estimates (KDE) are examples of these techniques [PPH12, PW12]. Additionally, conventional visualization solutions such as glyphs [HLNW11, PKRJ10, PMW13, SZD * 10] and visual variables © 2018 The Author(s) Computer Graphics Forum © 2018 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.