pgFMU: Integrating Data Management with Physical System Modelling Olga Rybnytska, Laurynas Šikšnys, Torben Bach Pedersen, and Bijay Neupane Department of Computer Science, Aalborg University olga@cs.aau.dk,siksnys@cs.aau.dk,tbp@cs.aau.dk,bj21.neupane@gmail.com ABSTRACT By expressing physical laws and control strategies, interopera- ble physical system models such as Functional Mock-up Units (FMUs) are playing a major role in designing, simulating, and evaluating complex (cyber-)physical systems. However, exist- ing FMU simulation software environments require signifcant user/developer efort when such models need to be tightly inte- grated with actual data from a database and/or model simulation results need to be stored in a database, e.g., as a part of larger user analytical workfows. Hence, users encounter substantial complexity and overhead when using such physical models to solve analytical problems based on real data. To address this is- sue, this paper proposes pgFMU - an extension to the relational database management system PostgreSQL for integrating and conveniently using FMU-based physical models inside a database environment. pgFMU reduces the complexity in specifying (and executing) analytical workfows based on such simulation models (requiring on average 22x fewer code lines) while maintaining improved overall execution performance (up to 8.43x faster for multi-instance scenarios) due to the optimization techniques and integration between database and an FMU library. With pgFMU, cyber-physical data scientists are able to develop a typical FMU workfow up to 11.74x faster than using the standard FMU soft- ware stack. When combined with an existing in-DBMS analytics tool, pgFMU can increase the accuracy of Machine Learning models by up to 21.1%. 1 INTRODUCTION Cyber-physical system experts, cyber-physical data scientists, and cyber-physical developers often need to analyze, predict, and simulate physical systems. For this purpose, physical sys- tem models are often used to capture time-dependent behaviour and dynamics of such systems [1]. They ofer powerful, rigor- ous, and cost-efective means to characterize and reason about such systems, without the need to build, interact, and/or inter- fere with such systems. Physical system models (physical models for short) are well supported by a number of physical system modelling software tools and environments. However, each such modelling environment often uses a specialized form and for- mat of a physical model with limited possibilities to utilize such models across diferent tools and environments. To mitigate this problem, Functional Mock-up Interface (FMI) [2] has emerged as a de-facto standard [3] to facilitate physical model exchange and co-simulation across a large number of modelling tools. In FMI, physical models are compiled into a standard representation, denoted as functional mock-up units (FMUs). FMUs refect real physical systems composed of physical and digital components © 2020 Copyright held by the owner/author(s). Published in Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), March 30-April 2, 2020, ISBN 978-3-89318-083-7 on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. interacting in complex ways according to a set of pre-defned con- trol strategies and physical laws. FMUs allow accurately defning system behaviour even in the physical states that are not ob- servable in the real world, unlike what is required by traditional AI/ML models (e.g., artifcial neural networks). Due to these ad- vantages, FMUs continue gaining popularity in relevant physical modeling communities [5]. FMI has already been broadly adopted and supported by 130+ software tools, including well-known sim- ulation environments Simulink (Matlab) [4] and EnergyPlus [6] (80.000+ downloads), as well as JModelica [8]/Open Modelica [10] (more than 20 companies and 30 universities in the consortium) with 1600+ model components and 1350+ functions from many domains available in a standard library alone. Despite comprehensive physical modelling support, existing FMI-based simulation environments and tools ofer poor data- base (DB) integration and lack built-in support for conveniently including physical models into user-defned analytical workfows. Thus, user data (e.g., model parameters, measurements, control inputs) cannot be conveniently supplied to the model from a DB and model results cannot be efectively used in larger analytical workfows (e.g., those encompassing multiple simulations) while ofering convenient declarative approaches to manage such phys- ical models within a common data management and physical modelling environment. Without these capabilities, model-driven analytical tasks become complicated and slow in terms of both de- velopment time and execution, and less usable for users working with, e.g., prescriptive analytics applications [12]). As a running example, consider a prediction problem from the energy feld. The aim is to predict and analyze indoor tem- peratures inside a house that is heated by an electric heat pump (HP) under diferent heating scenarios (e.g., no heating, heating at max power). For this task, a physical model represented as an FMU needs to be calibrated and simulated using measurements and weather data stored in the database. Predictions need to be stored in the database, for further analysis and visualization. In this case, as shown in Figure 1, the user has to pick relevant FMU-compliant software tools (e.g., JModelica [8] or Python [7] + PyFMI [14] + ModestPy [3]) and then use these tools to (1) load a pre-generated FMU fle or manually build an FMU fle from a model specifcation fle, (2) read historical measurements and (future) control inputs from a database, (3) recalibrate the model (e.g, using ModestPy) in case the model cannot ensure the good ft with the historical measurements, (4) validate the model against real measurements, and update the model and/or parameters values, (5) simulate the updated model to generate temperature predictions for diferent scenarios (e.g, using PyFMI), (6) export predicted values back to the database, and (7) perform further analysis utilizing a DBMS. This imposes limitations for the user in terms of number of software tools and libraries to use, and in terms of the overall complexity and ability to efectively utilize physical models in larger analytical workfows where both simulation and optimization are required. In this and similar user workfows, interleaved data exchange between a database and a Series ISSN: 2367-2005 109 10.5441/002/edbt.2020.11