J. Parallel Distrib. Comput. 65 (2005) 994–1006
www.elsevier.com/locate/jpdc
A fast, parallel spanning tree algorithm for symmetric multiprocessors
(SMPs)
David A. Bader
a , ∗
, Guojing Cong
b
a
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
b
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Received 7 February 2003; received in revised form 3 August 2004; accepted 22 March 2005
Available online 20 June 2005
Abstract
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer
to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there are few studies
that deal with the implementation and performance issues of running PRAM-style algorithms on SMPs. Our study in this paper focuses
on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building
block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems
that have simple and efficient sequential implementations and fast PRAM algorithms, but these irregular problems often have no known
efficient parallel implementations. Experimental studies have been conducted on related problems (minimum spanning tree and connected
components) using parallel computers, but only achieved reasonable speedup on regular graph topologies that can be implicitly partitioned
with good locality features or on very dense graphs with limited numbers of vertices. In this paper we present a new randomized algorithm
and implementation with superior performance that for the first time achieves parallel speedup on arbitrary graphs (both regular and
irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several
techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n>p
2
). As
the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new
light on implementing PRAM algorithms for shared-memory parallel computers. The main results of this paper are
1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and
irregular topologies; and
2. an experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with
the previous algorithms.
The source code for these algorithms is freely-available from our web site.
© 2005 Elsevier Inc. All rights reserved.
Keywords: Parallel graph algorithms; Connectivity; Shared memory; High-performance algorithm engineering
1. Introduction
Finding a spanning tree of a graph is an important build-
ing block for many graph algorithms, for example, bicon-
nected components and ear decomposition [32], and can
∗
Corresponding author.
E-mail addresses: bader@cc.gatech.edu (D.A. Bader),
gcong@us.ibm.com (G. Cong).
0743-7315/$ - see front matter © 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.jpdc.2005.03.011
be used in graph planarity testing [28]. The best sequential
algorithm for finding a spanning tree of a graph G = (V,E)
where n =|V | and m =|E| uses depth- or breadth-first
graph traversal and runs in O(m + n). The implementa-
tion of the sequential algorithms are very efficient (linear
time with a very small hidden constant), and the only data
structure used is a stack or queue which has good locality
features. However, graph traversal using depth-first search
(DFS) is inherently sequential and known not to parallelize