An Experimental Study on How to Build
Efficient Multi-Core Clusters for High Performance Computing
Luiz Carlos Pinto, Luiz H. B. Tomazella, M. A. R. Dantas
Distributed Systems Research Laboratory ( LaPeSD )
Department of Informatics and Statistics ( INE )
Federal University of Santa Catarina ( UFSC )
{ luigi, tomazella, mario }@inf.ufsc.br
Abstract
Multi-core technology produces a new scenario for
communicating processes in an MPI cluster
environment and consequently the involved trade-offs
need to be uncovered. This motivation guided our
research and lead to a new approach for setting up
more efficient clusters built with commodities. Thus,
alternatively to the utilization of non-commodity
interconnects such as Myrinet and Infiniband, we
present a proposal based on leaving cores idle
relatively to application processing in order to build
economically more accessible clusters of commodities
with higher performance. Execution of fine-grained IS
algorithm from NAS Parallel Benchmark revealed a
speedup of up to 25%. Interestingly, a cluster
organized according to the proposed setup was able to
outperform a single multi-core SMP host in which all
processes communicate inside the host. Therefore,
empirical results indicate that our proposal has been
successful for medium and fine-grained algorithms.
1. Introduction
Scientific applications used to be executed
especially on expensive and proprietary massively
parallel processing (MPP) machines. As processing
power and communication speed are increasingly
becoming off-the-shelf products, building clusters of
commodities [27] has been taking a large piece on high
performance computing (HPC) world [5].
Not long ago, identical single-processor computing
nodes used to be aggregated in order to form a cluster,
also known as NoW (Network of Workstations). Such
parallel architecture design demands a distributed
memory programming interface like MPI [1] for inter-
process communication. As each computing node has
its own memory subsystem and path to the
interconnect fabric, it roughly means that each MPI
process execution is independent of the others.
Nowadays, multi-processor (SMP) and now multi-
core (CMP) technologies are increasingly finding their
way into cluster computing. Inevitably, clusters built
using SMP and also CMP-SMP nodes will become
more and more common. Lacking of a widely accepted
term for CMP-SMP cluster design, both architectures
will be referenced as CLUMPS, a usual term for
defining a cluster of SMP nodes.
Traditional MPI programs follow the SPMD (Single
Program, Multiple Data) parallel programming model
which was designed basically for cluster architectures
built using nodes with a single processing unit, that is
to say single-core nodes. For example, in a modern
cluster design built with multi-core multi-processor
nodes, the access to interconnect fabric is shared by
locally executing processes. Either main memory
accessing or CMP-SMP’s usually deeper cache
memory hierarchy might also slow down inter-process
communication, since bus and memory subsystem of
each node is shared. Thus, in such a modern cluster,
moving data around between communicating cores is a
function of not only their physical distance (inside a
processor socket, inside a node or inter-node) but also
of shared memory and network bandwidth limitations.
Our motivation concerns the importance of realizing
from the point of view of an architectural designer that
modern multi-core cluster designs create a different
scenario for predicting performance. This urging need
to understand the trade-offs between these architectural
cluster designs guided our research and finally lead to a
new approach for setting up more efficient clusters of
commodities. Thus, an alternative approach to the
utilization of non-commodity interconnects, such as
Myrinet and Infiniband, is proposed in order to build
economically more accessible clusters of commodities
with higher performance.
2008 11th IEEE International Conference on Computational Science and Engineering
978-0-7695-3193-9/08 $25.00 © 2008 IEEE
DOI 10.1109/CSE.2008.63
33
2008 11th IEEE International Conference on Computational Science and Engineering
978-0-7695-3193-9/08 $25.00 © 2008 IEEE
DOI 10.1109/CSE.2008.63
33
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on February 24, 2009 at 17:38 from IEEE Xplore. Restrictions apply.