J Supercomput DOI 10.1007/s11227-009-0296-3 Performance analysis and optimization of MPI collective operations on multi-core clusters Bibo Tu · Jianping Fan · Jianfeng Zhan · Xiaofang Zhao © Springer Science+Business Media, LLC 2009 Abstract Memory hierarchy on multi-core clusters has twofold characteristics: ver- tical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy on multi-core clus- ters in vertical and horizontal levels. Experimental results show that new model can predict communication costs for message passing on multi-core clusters more accu- rately than previous models, only incorporated vertical memory hierarchy. The new model provides the theoretical underpinning for the optimal design of MPI collective operations. Aimed at horizontal memory hierarchy, our methodology for optimizing collective operations on multi-core clusters focuses on hierarchical virtual topology and cache-aware intra-node communication, incorporated into existing collective al- gorithms in MPICH2. As a case study, multi-core aware broadcast algorithm has been implemented and evaluated. The results of performance evaluation show that the above methodology for optimizing collective operations on multi-core clusters is efficient. Keywords Parallel computation model · Multi-core clusters · Memory hierarchy · MPI collective operations · Data tiling B. Tu () · J. Fan · J. Zhan · X. Zhao Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China e-mail: tbb@ncic.ac.cn J. Fan e-mail: fan@ict.ac.cn J. Zhan e-mail: jfzhan@ncic.ac.cn X. Zhao e-mail: zhaoxf@ict.ac.cn J. Fan Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518067, China