Randomized Shortest-Path Problems: Two related models Marco Saerens 1* , Youssef Achbany 1 , Franc ¸ois Fouss 1,2 & Luh Yen 1 1 Information Systems Unit (ISYS) & Machine Learning Group (MLG) Universit´ e catholique de Louvain (UCL), Belgium {marco.saerens, youssef.achbany, luh.yen}@uclouvain.be 2 Management Sciences Department Facult´ es Universitaires Catholiques de Mons (FUCaM), Belgium francois.fouss@fucam.ac.be January 23, 2009 Abstract This letter addresses the problem of designing the transition probabili- ties of a ﬁnite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a ﬁxed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance by minimizing the total travel cost. Nothing particular up to now – you could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, to avoid congestion, or to avoid complete pre- dictability of your routing strategy. In other words, you want to introduce some randomness/unpredictability in the routing policy, i.e., the routing policy is randomized. This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantiﬁed by the expected Shan- non entropy spread throughout the network, and is provided a priori by the designer. Then, necessary conditions allowing to compute the optimal randomized policy – minimizing the expected routing cost – are derived. Iterating these necessary conditions, reminiscent of Bellman’s value itera- tion equations, allows to compute an optimal policy, that is, a set of transi- tion probabilities in each node. Interestingly and surprisingly enough, this ﬁrst model, while formulated in a totally different framework, is equiva- lent to Akamatsu’s model (Akamatsu (1996)), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsu’s model by recasting it into a sum-over-paths statistical-physics formalism allowing to easily derive all the quantities of interest in an ele- gant, uniﬁed, way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second * Marco Saerens is also a Research Fellow of the IRIDIA Laboratory, Universit´ e Libre de Brux- elles. 1