Simple O(m log n) Time Markov Chain Lumping Antti Valmari 1 and Giuliana Franceschinis 2 1 Tampere University of Technology, Department of Software Systems P.O. Box 553, FI-33101 Tampere, Finland Antti.Valmari@tut.fi 2 Dip. di Informatica, Univ. del Piemonte Orientale viale Teresa Michel 11, 15121 Alessandria, Italy Giuliana.Franceschinis@mfn.unipmn.it Abstract. In 2003, Derisavi, Hermanns, and Sanders presented a com- plicated O(m log n) time algorithm for the Markov chain lumping prob- lem, where n is the number of states and m the number of transitions in the Markov chain. They speculated on the possibility of a simple al- gorithm and wrote that it would probably need a new way of sorting weights. In this article we present an algorithm of that kind. In it, the weights are sorted with a combination of the so-called possible majority candidate algorithm with any O(k log k) sorting algorithm. This works because, as we prove in the article, the weights consist of two groups, one of which is sufficiently small and all weights in the other group have the same value. We also point out an essential problem in the description of the earlier algorithm, prove the correctness of our algorithm in detail, and report some running time measurements. 1 Introduction Markov chains are widely used to analyze the behaviour of dynamic systems and to evaluate their performance or dependability indices. One of the problems that limit the applicability of Markov chains to realistic systems is state space explo- sion. Among the methods that can be used to keep this problem under control, lumping consists of aggregating states of the Markov chain into “macrostates”, hence obtaining a smaller Markov chain while preserving the ability to check desired properties on it. We refer to [4,8] for different lumpability concepts and their use in the analysis of systems. For the purpose of this article it suffices that in the heart of their use is the problem of constructing the coarsest lumping quotient of a Markov chain. We define this problem formally in Section 2, and call it “the lumping problem” for brevity. Let n denote the number of states and m the number of transitions in the Markov chain. An O(n + m log n) time algorithm for the lumping problem was given in [6,5]. It is (loosely) based on the Paige–Tarjan relational coarsest par- tition algorithm [10] of similar complexity. Unless the input is pathological with many isolated states, we have n = O(m) implying O(n + m log n)= O(m log n). Therefore, it is common practice to call these algorithms O(m log n). J. Esparza and R. Majumdar (Eds.): TACAS 2010, LNCS 6015, pp. 38–52, 2010. c Springer-Verlag Berlin Heidelberg 2010