Testing Stochastic Processes through Reinforcement Learning Franc ¸ois Laviolette epartement IFT-GLO Universit´ e Laval Qu´ ebec, Canada francois.laviolette@ift.ulaval.ca Sami Zhioua epartement IFT-GLO Universit´ e Laval Qu´ ebec, Canada sami.zhioua@ift.ulaval.ca Abstract We propose a new approach to verification of probabilistic processes for which the model may not be available. We show how to use a technique from Reinforcement Learning to approximate how far apart two processes are by solving a Markov Decision Process. The key idea of the approach is to define the MDP out of the processes to be tested, in such a way that the optimal value is interpreted as a divergence between the processes. This divergence can therefore be estimated by Reinforcement Learning methods; moreover, if the two systems are not equivalent, the algorithm returns the test(s) witnessing the non-equivalence. We show how the approach can be adapted to (1) several equivalence notions (trace, ready, etc.) but more importantly to (2) other stochastic formalisms, in particular to MDPs themselves. 1 Introduction In program verification, the goal is typically to check automatically whether a system (program, physical device, protocol, etc.) conforms to its pre-established specification. For non-probabilistic systems, one usually expects equivalence between the two, and most of the time this equivalence is chosen to be bisimulation. In the verification of probabilistic systems the comparison between the program and the specification should not be based on equivalences [7]: one reason is that the prob- abilities involved often come from approximations of the actual numbers. Hence a slight difference in the probabilities between two processes should not necessarily be interpreted as non equivalence. Instead, one is interested in a notion of distance or divergence 1 to quantify how far apart the pro- cesses are. When defining a distance, we have two focus: its computability, of course, but also the relation induced by zero distance. The actual value of the distance is usually not relevant but the derived relation, for example bisimulation or trace equivalence, is a guide to evaluate the power or adequacy of the distance. In real scenarios, the model of the implementation is rarely known and the available information can only be gathered by interacting with the system. Consequently, verification in this setting has to be based on some form of sampling (or testing). In their famous paper on probabilistic transition sys- tems [12], Larsen and Skou defined a test language that corresponds to probabilistic bisimulation: two processes are bisimilar if and only if they accept the same tests with the same probabilities. From the maximal difference over the probabilities on these tests, Van Breugel et. al. [2] have de- fined a divergence (in fact a pseudo-metric) between processes. However the fact that this divergence is based on bisimulation, a strong notion of equivalence, makes it hard to compute. In [6], we in- troduced K-moment equivalence, and we showed how to compute a divergence whose zero value is K-moment equivalence. Key properties of this new equivalence is that (1) it stands strictly be- tween bisimulation and trace and (2) it is testable. The divergence is, as for Van Breugel et al.’s pseudo-metric, the maximal difference over the probabilities on tests. 1 A divergence is a distance that may not satisfy the triangle inequality and symmetry.