Software Engineering and Applications 软件工程与应用, 2013, 2, 15-19 http://dx.doi.org/10.12677/sea.2013.21003 Published Online February 2013 (http://www.hanspub.org/journal/sea.html) Performance Analysis of Large Scale Parallel Matrix Multiplication Zhi Shang 1 , Shuo Chen 2 1 Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore 2 School of Aerospace Engineering and Applied Mechanics, Tongji University, Shanghai Email: shangz@ihpc.a-star.edu.sg, shangzhi@tsinghua.org.cn, schen_tju@mail.tongji.edu.cn Received: Nov. 17 th , 2012; revised: Dec. 3 rd , 2012; accepted: Dec. 9 th , 2012 Abstract: The large scale computing is difficult to be avoided following the requirement of modern scientific re- searches and practical engineering applications. The computing and processing of massive data have to be involved in these large scale computing. The parallel computing therefore is employed to solve these issues of large scale comput- ing not only on fast computing but also on data processing. MPI-based parallel computing can easily realize distributed computing and massive data scattered in the cluster supercomputer, making each a single processor to handle a small portion of data in order to achieve fast computing and large scale computing. Based on MPI parallel programming a large-scale matrix multiplication operation was developed. Through the testes on the parallel performance of the point to point communications with the blocking communication, non-blocking communication and mixed communication, a complete set of rapid communication to prevent the occurrence of deadlock was established. The results were significant for the future practical applications. Keywords: Large Scale Computing; Massive Data; Parallel Computing; Distributed Computing; Matrix Multiplication 大型矩阵相乘并行计算的特性分析尚智 1 , 陈硕 2 1 新加坡科技研究局高性能计算研究院，新加坡 2 同济大学航空航天与力学学院，上海 Email: shangz@ihpc.a-star.edu.sg, shangzhi@tsinghua.org.cn, schen_tju@mail.tongji.edu.cn 收稿日期：2012 年 11 月 17 日；修回日期：2012 年 12 月 3 日；录用日期：2012 年 12 月 9 日摘要：随着科学研究和工程计算的发展，大规模计算和模拟已经无法避免。这些大规模计算往往涉及海量数据的运算和处理，因此并行计算被用来一方面解决大规模的快速计算，另一方面解决海量数据的处理。基于 MPI 的并行计算可以方便地进行分布式运算，把海量数据分散在集超级计算机上，使得每单个处理器(CPU)处理一小部分数据，从而实现快速运算和大规模计算。本文基于 MPI 的并行程，实现了大规模矩阵的相乘运算，并且测试了点对点通信下的不通信机制(阻塞通信、非阻塞通信及其混合通信)的标准通信的并行性能。针对大型矩阵相乘计算，组建了完整的快速标准通信方法，并且防止死锁的发生。为今后的进一步实际应用奠定基础，提供有用的参考。关键词：大规模计算；海量数据；并行计算；分布式运算；矩阵相乘 1. 引言矩阵在科学计算中有着广泛地应用，许多科学计算问题最后实际上都归为对矩阵的操作和运算上。 如在多重信号分类(multiple signal classification, Copyright © 2013 Hanspub 15