17 DSMC Evaluation Stages: Fostering Robust and Safe Behavior in Deep Reinforcement Learning – Extended Version TIMO P. GROS, JOSCHKA GROß, and DANIEL HÖLLER, Saarland University, Saarland Informatics Campus, Germany JÖRG HOFFMANN, Saarland University and German Research Center for Artiﬁcial Intelligence (DFKI), Saarland Informatics Campus Saarbrücken, Germany MICHAELA KLAUCK, HENDRIK MEERKAMP, NICOLA J. MÜLLER, and LUKAS SCHALLER, Saarland University, Saarland Informatics Campus, Germany VERENA WOLF, Saarland University and German Research Center for Artiﬁcial Intelligence (DFKI), Saarland Informatics Campus, Germany Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning (DRL), in particular, is extremely successful in learning action policies in complex and dynamic environ- ments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical applications: (i) the training objective maximizes average rewards, which may disregard rare but critical situa- tions and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated reward structures, which, for DRL to work, must be replaced with proxy objectives. Here, we introduce a methodology that can help to address both deﬁciencies. We incorporate evaluation stages (ES) into DRL, lever- aging recent work on deep statistical model checking (DSMC), which veriﬁes NN policies in Markov decision processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance. We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations and (ii) allowing to foster arbitrary objectives. We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used benchmark in the AI community. Our results show that DSMC-based ES can signiﬁcantly improve both (i) and (ii). CCS Concepts: • Computing methodologies → Artiﬁcial intelligence; Markov decision processes; Neural networks; Machine learning algorithms;• Theory of computation → Design and analysis of algorithms; This work was partially supported by the German Research Foundation (DFG) under grant No. 389792660, as part of TRR 248, see https://perspicuous-computing.science, and by the European Regional Development Fund (ERDF). Authors’ addresses: T. P. Gros, J. Groß , D. Höller, M. Klauck, H. Meerkamp, N. J. Müller, L. Schaller, Saarland University, Saar- land Informatics Campus, Building E1.3, Saarbrücken, Saarland, Germany, 66123; emails: {timopgros, jgross, hoeller, klauck, meerkamp, nmueller, lschaller}@cs.uni-saarland.de; J. Hoﬀmann, Saarland University and German Research Center for Ar- tiﬁcial Intelligence (DFKI), Saarland Informatics Campus Saarbrücken, Building E1.3, Saarbrücken, Saarland, Germany, 66123; email: hoﬀmann@cs.uni-saarland.de; V. Wolf, Saarland University and German Research Center for Artiﬁcial Intel- ligence (DFKI), Saarland Informatics Campus, Building E1.3, Saarbrücken, Saarland, Germany, 66123; email: wolf@cs.uni- saarland.de. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. Request permissions from permissions@acm.org. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1049-3301/2023/10-ART17 $15.00 https://doi.org/10.1145/3607198 ACM Transactions on Modeling and Computer Simulation, Vol. 33, No. 4, Article 17. Publication date: October 2023.