arXiv:2107.03863v2 [stat.ML] 23 Aug 2021 Benchpress: A Scalable and Platform-Independent Workflow for Benchmarking Structure Learning Algorithms for Graphical Models Felix L. Rios University of Basel Giusi Moffa University of Basel Jack Kuipers ETH Z¨ urich Abstract Describing the relationship between the variables in a study domain and modelling the data generating mechanism is a fundamental problem in many empirical sciences. Prob- abilistic graphical models are one common approach to tackle the problem. Learning the graphical structure is computationally challenging and a fervent area of current research with a plethora of algorithms being developed. To facilitate the benchmarking of differ- ent methods, we present a novel automated workflow, called benchpress for producing scalable, reproducible, and platform-independent benchmarks of structure learning algo- rithms for probabilistic graphical models. Benchpress is interfaced via a simple JSON-file, which makes it accessible for all users, while the code is designed in a fully modular fashion to enable researchers to contribute additional methodologies. Benchpress cur- rently provides an interface to a large number of state-of-the-art algorithms from libraries such as BDgraph, BiDAG, bnlearn, GOBNILP, pcalg, r.blip, scikit-learn, TETRAD, and trilearn as well as a variety of methods for data generating models and performance evalu- ation. Alongside user-defined models and randomly generated datasets, the software tool also includes a number of standard datasets and graphical models from the literature, which may be included in a benchmarking workflow. We demonstrate the applicability of this workflow for learning Bayesian networks in four typical data scenarios. The source code and documentation is publicly available from http://github.com/felixleopoldo/ benchpress. Keywords : reproducibility, scalable benchmarking, probabilistic graphical models. 1. Introduction Probabilistic graphical models play a central role in modern statistical data analysis. Their compact and elegant way to visualise and represent complex dependence structures in multi- variate probability distributions have shown to be successfully applicable in many scientific domains, ranging from disciplines such as social sciences and image analysis to biology, med- ical diagnosis and epidemiology (see e.g. Elwert 2013; Friedman, Linial, Nachman, and Pe’er 2000; Friedman 2004; Moffa, Catone, Kuipers, Kuipers, Freeman, Marwaha, Lennox, Broome, and Bebbington 2017; Kuipers, Thurnherr, Moffa, Suter, Behr, Goosen, Christofori, and Beerenwinkel 2018b; Kuipers, Moffa, Kuipers, Freeman, and Bebbington 2019). One of the main advantages of graphical models is that they provide a tool for experts and researchers from non-statistical fields to easily specify their assumptions in a specific problem