Performance Evaluation of a Parallel Dynamic Programming Algorithm for Solving the Matrix Chain Product Problem Bchira BEN MABROUK 1 , Hamadi HASNI 2 1 Higher Institute of Applied Sciences and Technologies, University of Carthage, 7030, Mateur, Tunisia benmabrouk_bchira@yahoo.fr 2 National School of Computer Science University Campus of Manouba , 2010, Manouba, Tunisia Hamadi.Hasni@ensi.rnu.tn Zaher MAHJOUB University of Tunis El Manar Faculty of Sciences of Tunis University Campus 2092, Manar II, Tunis, Tunisia Zaher.Mahjoub@fst.rnu.tn Abstract—We address in this paper a particular combinato- rial optimization problem (COP) namely the matrix chain product problem (MCPP). We particularly consider the parallelization of the dynamic programming algorithm (DPA) for solving the MCPP which is structured in a DO loop nest of depth 3. Our approach is based on a three-phase procedure. The first consists in transforming the DPA into a perfect loop nest (PLN). The second applies a dependency analysis within the initial PLN permitting the determination of the type of each loop (serial or parallel). As to the third phase, it applies on the initial PLN the loop interchange technique in order to increase the parallelism degree. We focus in this paper on an experimental study achieved on a parallel multicore machine that permits to validate our theoretical contribution. Keywords— combinatorial optimization problem; dependence analysis; DO loop nest; dynamic programming; loop interchange; matrix chain product; multicore machine; parallelization; performance evaluation; polyhedral algorithm. I. INTRODUCTION Dynamic programming (DP) is an efficient paradigm for the design of algorithms solving a large class of combinatorial optimisation problems (COP). DP algorithms (DPA) have the particular structure of DO loop nests and are, in most cases, of polynomial complexity. Such algorithms are also polyhedral algorithms. Given an input COP, the DP paradigm adopts a bottom-up approach leading to first solving sub-problems whose solutions are used to solve sub-problems of larger size. The procedure is then iterated until determining the solution of the input problem. The key idea is to express, through a recurrence formula, the solution of the initial problem in terms of the solutions of its son sub-problems [10]. We are particularly interested in this paper in the DP paradigm for solving the matrix chain product problem (MCPP). This problem has in fact diverse real world applications e.g in robotics, process control, computer animation [31]. Our aim here is to use several versions of the DPA for the MCPP and study their parallelization. Indeed, a detailed theoretical study, based on a previous brief presentation [4] is first given. In addition, we focus on an experimental study targeting a multicore machine in order to achieve an accurate performance evaluation of the designed parallel DPAs that permits to validate our contribution. The remainder of the paper is organised as follows. In section 2, we first present the MCPP and the associated DPA for solving it, then a state-of-the art on previous works including sequential and parallel algorithms. Section 3 is devoted to a description of our parallelization approach. An experimental study is described in section 4. Finally we conclude our work in section 5 and propose some perspectives. II. THE MATRIX CHAIN PRODUCT PROBLEM (MCPP) A. Presentation The Matrix chain product problem (MCPP) is a combinatorial optimization problem (COP) consisting in finding an optimal parenthesization of a chain of rectangular matrices i.e. that minimizes the total number of required multiplications [10]. Indeed, if we consider a chain of three or more matrices to be multiplied, the total number of multiplications may vary depending on the chain parenthesization. To be convinced, consider for instence three matrices A, B, and C, of size 5×1, 1×5, and 5×1, respectively. The product ABC may be done in two ways i.e. according to two parenthesizations namely either (AB)C or A(BC). Clearly, the first requires 5×1×5 + 5×5×1 = 50 multiplications while the second requires only 1×5×1 + 5×1×1 = 10 multiplications. For a chain involving n matrices, n being large, we cannot afford trying alla the possible parenthesizations since their number, called the Catalan number, is equal to Ω(4 n /n 3/2 ) [10][22]. 978-1-4799-7100-8/14/$31.00 ©2014 Crown 109