A New Parallel Paradigm for Block-based Gauss-Jordan Algorithm Ling Shang, Serge Petiton and Maxime Hugues LIFL, University of Science and Technology of Lille Grand-Large Team, INRIA Futurs Lille, France (ling.shang, serge.petiton, maxime.hugues)@lifl.fr Abstract—Two kinds of parallel possibilities which are intra- step and inter-steps parallelism exist in the block-based Gauss- Jordan algorithm which is a classical method of large scale matrix inversion. But the existing parallel paradigm of Block- based Gauss-Jordan algorithm just aiming at the intra-step parallelism, can’t meet the requirement of making more tasks executed simultaneously in high performance platform can be harnessed more and more computing resources. To overcome the problem described above, this paper presents a hybrid parallel paradigm exploiting all the possible parallelizable parts of the Gauss-Jordan algorithm. In this hybrid parallel paradigm, 1) Divide and conquer paradigm is responsible for decomposing the large granularity task into sub-tasks as much as possible; 2) Single program multi data (SPMD) paradigm deals with intra-step parallelism in the algorithm; 3) Data pipelining paradigm helps to solve the problem of inter-steps parallelism. Finally some experiments based on comparison the hybrid parallel paradigm with the existing parallel paradigm show us the good performance of our paradigm. Keywords-Gauss-Jordan algorithm; parallel paradigm; data dependence, parallelism I. I NTRODUCTION A good parallel programming paradigm should help to maximize parallel execution of the algorithm, thus achieving better performance. And the choice of paradigm is deter- mined by the available parallel computing resources and by the type of parallelism inherent in the problem [14]. Exploiting the significant computational capability avail- able in the internet-based Grid environment has gained an enthusiastic acceptance within the high performance computing community, and the current tendency favors this sort of commodity supercomputing [10]. Multi-core Archi- tectures (MCAs) provide applications with an opportunity to achieve much higher performance and the number of cores on MCAs is likely to continue growing, increasing the performance potential of MCAs [12]. All those technologies are mainly motivated by the fact that most of the scientific communities have the desire to minimize economic risk and rely on consumer based off-the-shelf technology. Grid computing and multi-core have been recognized as the wave of the future to solve large scientific problems. However, re- alizing this performance potential in an application requires the application to expose a significant amount of thread- level parallelism. It is important to find a solution to get maximal parallelism in a certain algorithm for researchers, thus exploiting computing resources in the Grid platform or MCAs as much as possible. Block-based Gauss-Jordan algorithm [1][2][4], as a clas- sical method of large scale matrix inversion, can be used in weather prediction, aircraft design, graphic transformation and so on. Its high availability in many domains makes it become the focus of many researchers. Serge shows the parallel version of the algorithm adapting to MIMD [1]. N. Melab et al not only give us its parallel version tailoring to MARS but also analyze all the possible parallelism in the algorithm [2][4]. L. M. Aouad et al present its parallel programming paradigm based on SPMD [7]. As well known to us all, paradigm is a class of algorithms that have the same control structure and we can very easily tailor it to any execution models such as MPI, PVM and other middleware suiting for high performance computing. And a good programming paradigm is very important for an algorithm to get better performance. But programming paradigm given by N. Melab and L. M. Aouad doesn’t take inter-steps parallelism into account. To improve the efficiency of the algorithm, it is important and necessary to find a solution which can exploit all the inter-steps and intra- step parallelism in the algorithm, thus generating more tasks and making these tasks executed simultaneously. So analysis based on the sequential block-based Gauss-Jordan algorithm has been made and some characters can be summarized as follows: 1) All the objects of operation are data blocks and the sequence of operations in the algorithm is decided by data dependence between different blocks; 2) The number of steps of algorithm execution is equal to the number of data- blocks divided into; 3) the parallelism of basic operation in the algorithm depends on the data write-operation ; 4) the number of data write-operation is same in each iterative step. This analysis can tell us that data dependence between different blocks in the algorithm plays a very important role. So this paper emphasizes on the analysis of data dependence of different blocks and table is used to simulate the real matrix manipulation. Then formal description based on table simulation is made to demonstrate the existed data dependence between different blocks. At the same time, write-operation which controls the data dependence 2009 Eighth International Conference on Grid and Cooperative Computing 978-0-7695-3766-5/09 $25.00 © 2009 IEEE DOI 10.1109/GCC.2009.75 201 2009 Eighth International Conference on Grid and Cooperative Computing 978-0-7695-3766-5/09 $25.00 © 2009 IEEE DOI 10.1109/GCC.2009.75 201 2009 Eighth International Conference on Grid and Cooperative Computing 978-0-7695-3766-5/09 $25.00 © 2009 IEEE DOI 10.1109/GCC.2009.75 193 2009 Eighth International Conference on Grid and Cooperative Computing 978-0-7695-3766-5/09 $25.00 © 2009 IEEE DOI 10.1109/GCC.2009.75 193