J. Parallel Distrib. Comput. 65 (2005) 994–1006 www.elsevier.com/locate/jpdc A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs) David A. Bader a , , Guojing Cong b a College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA b IBM T.J. Watson Research Center, Yorktown Heights, NY, USA Received 7 February 2003; received in revised form 3 August 2004; accepted 22 March 2005 Available online 20 June 2005 Abstract The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there are few studies that deal with the implementation and performance issues of running PRAM-style algorithms on SMPs. Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but these irregular problems often have no known efficient parallel implementations. Experimental studies have been conducted on related problems (minimum spanning tree and connected components) using parallel computers, but only achieved reasonable speedup on regular graph topologies that can be implicitly partitioned with good locality features or on very dense graphs with limited numbers of vertices. In this paper we present a new randomized algorithm and implementation with superior performance that for the first time achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n>p 2 ). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for shared-memory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. an experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freely-available from our web site. © 2005 Elsevier Inc. All rights reserved. Keywords: Parallel graph algorithms; Connectivity; Shared memory; High-performance algorithm engineering 1. Introduction Finding a spanning tree of a graph is an important build- ing block for many graph algorithms, for example, bicon- nected components and ear decomposition [32], and can Corresponding author. E-mail addresses: bader@cc.gatech.edu (D.A. Bader), gcong@us.ibm.com (G. Cong). 0743-7315/$ - see front matter © 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2005.03.011 be used in graph planarity testing [28]. The best sequential algorithm for finding a spanning tree of a graph G = (V,E) where n =|V | and m =|E| uses depth- or breadth-first graph traversal and runs in O(m + n). The implementa- tion of the sequential algorithms are very efficient (linear time with a very small hidden constant), and the only data structure used is a stack or queue which has good locality features. However, graph traversal using depth-first search (DFS) is inherently sequential and known not to parallelize