J Supercomput
DOI 10.1007/s11227-009-0296-3
Performance analysis and optimization of MPI
collective operations on multi-core clusters
Bibo Tu · Jianping Fan · Jianfeng Zhan ·
Xiaofang Zhao
© Springer Science+Business Media, LLC 2009
Abstract Memory hierarchy on multi-core clusters has twofold characteristics: ver-
tical memory hierarchy and horizontal memory hierarchy. This paper proposes new
parallel computation model to unitedly abstract memory hierarchy on multi-core clus-
ters in vertical and horizontal levels. Experimental results show that new model can
predict communication costs for message passing on multi-core clusters more accu-
rately than previous models, only incorporated vertical memory hierarchy. The new
model provides the theoretical underpinning for the optimal design of MPI collective
operations. Aimed at horizontal memory hierarchy, our methodology for optimizing
collective operations on multi-core clusters focuses on hierarchical virtual topology
and cache-aware intra-node communication, incorporated into existing collective al-
gorithms in MPICH2. As a case study, multi-core aware broadcast algorithm has
been implemented and evaluated. The results of performance evaluation show that
the above methodology for optimizing collective operations on multi-core clusters is
efficient.
Keywords Parallel computation model · Multi-core clusters · Memory hierarchy ·
MPI collective operations · Data tiling
B. Tu ( ) · J. Fan · J. Zhan · X. Zhao
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
e-mail: tbb@ncic.ac.cn
J. Fan
e-mail: fan@ict.ac.cn
J. Zhan
e-mail: jfzhan@ncic.ac.cn
X. Zhao
e-mail: zhaoxf@ict.ac.cn
J. Fan
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518067,
China