17
DSMC Evaluation Stages: Fostering Robust and Safe
Behavior in Deep Reinforcement Learning – Extended
Version
TIMO P. GROS, JOSCHKA GROß, and DANIEL HÖLLER, Saarland University, Saarland
Informatics Campus, Germany
JÖRG HOFFMANN, Saarland University and German Research Center for Artificial Intelligence (DFKI),
Saarland Informatics Campus Saarbrücken, Germany
MICHAELA KLAUCK, HENDRIK MEERKAMP, NICOLA J. MÜLLER, and
LUKAS SCHALLER, Saarland University, Saarland Informatics Campus, Germany
VERENA WOLF, Saarland University and German Research Center for Artificial Intelligence (DFKI),
Saarland Informatics Campus, Germany
Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning
(DRL), in particular, is extremely successful in learning action policies in complex and dynamic environ-
ments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical
applications: (i) the training objective maximizes average rewards, which may disregard rare but critical situa-
tions and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated
reward structures, which, for DRL to work, must be replaced with proxy objectives. Here, we introduce a
methodology that can help to address both deficiencies. We incorporate evaluation stages (ES) into DRL, lever-
aging recent work on deep statistical model checking (DSMC), which verifies NN policies in Markov decision
processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance.
We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations
and (ii) allowing to foster arbitrary objectives.
We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous
driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used
benchmark in the AI community. Our results show that DSMC-based ES can significantly improve both (i)
and (ii).
CCS Concepts: • Computing methodologies → Artificial intelligence; Markov decision processes;
Neural networks; Machine learning algorithms;• Theory of computation → Design and analysis of
algorithms;
This work was partially supported by the German Research Foundation (DFG) under grant No. 389792660, as part of TRR
248, see https://perspicuous-computing.science, and by the European Regional Development Fund (ERDF).
Authors’ addresses: T. P. Gros, J. Groß , D. Höller, M. Klauck, H. Meerkamp, N. J. Müller, L. Schaller, Saarland University, Saar-
land Informatics Campus, Building E1.3, Saarbrücken, Saarland, Germany, 66123; emails: {timopgros, jgross, hoeller, klauck,
meerkamp, nmueller, lschaller}@cs.uni-saarland.de; J. Hoffmann, Saarland University and German Research Center for Ar-
tificial Intelligence (DFKI), Saarland Informatics Campus Saarbrücken, Building E1.3, Saarbrücken, Saarland, Germany,
66123; email: hoffmann@cs.uni-saarland.de; V. Wolf, Saarland University and German Research Center for Artificial Intel-
ligence (DFKI), Saarland Informatics Campus, Building E1.3, Saarbrücken, Saarland, Germany, 66123; email: wolf@cs.uni-
saarland.de.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
1049-3301/2023/10-ART17 $15.00
https://doi.org/10.1145/3607198
ACM Transactions on Modeling and Computer Simulation, Vol. 33, No. 4, Article 17. Publication date: October 2023.