Engineering a Topological Sorting Algorithm for Massive Graphs
Deepak Ajwani
*
Adan Cosgaya-Lozano
†
Norbert Zeh
‡
Abstract
We present an I/O-efficient algorithm for topologically
sorting directed acyclic graphs (DAGs). No provably
I/O-efficient algorithm for this problem is known. Sim-
ilarly, the performance of our algorithm, which we call
IterTS, may be poor in the worst case. However, our
experiments show that IterTS achieves good perfor-
mance in practise.
The strategy of IterTS can be summarized as
follows. We call an edge satisfied if its tail has a smaller
number than its head. A numbering satisfying at least
half the edges in the DAG is easy to find: a random
numbering is expected to have this property. IterTS
starts with such a numbering and then iteratively
corrects the numbering to satisfy more and more edges
until all edges are satisfied.
To evaluate IterTS, we compared its running time
to those of three competitors: PeelTS, an I/O-efficient
implementation of the standard strategy of iteratively
removing sources and sinks; ReachTS, an I/O-efficient
implementation of a recent parallel divide-and-conquer
algorithm based on reachability queries; and SeTS,
standard DFS-based topological sorting built on top
of a semi-external DFS algorithm. In our evaluation
on various types of input graphs, IterTS consistently
outperformed PeelTS and ReachTS, by at least an
order of magnitude in most cases. SeTS outperformed
IterTS on most graphs whose vertex sets fit in memory.
However, IterTS often came close to the running time
of SeTS on these inputs and, more importantly, SeTS
was not able to process graphs whose vertex sets were
beyond the size of main memory, while IterTS was
able to process such inputs efficiently.
∗
MADALGO Center for Massive Data Algorithmics, Depart-
ment of Computer Science, Aarhus University, Aarhus, Denmark.
Email: ajwani@madalgo.au.dk. Research was supported in part
by the Danish National Research Foundation. Travel to the con-
ference was supported by IRCSET/IBM.
†
Faculty of Computer Science, Dalhousie University, Halifax,
NS, Canada. Email: acosgaya@cs.dal.ca. Supported in part by
NSERC.
‡
Faculty of Computer Science, Dalhousie University, Halifax,
NS, Canada. Email: nzeh@cs.dal.ca. This research was
supported in part by NSERC and the Canada Research Chairs
programme.
1 Introduction
Let G =(V,E) be a directed acyclic graph (DAG) with
n := |V | vertices and m := |E| edges. Topological
sorting is the problem of finding a linear ordering
of the vertices in V such that the tail of each edge
in E precedes its head in the ordering. Linear-time
algorithms for this problem are covered in standard
undergraduate texts, as topological sorting captures the
problem of finding a linear order of items or activities
consistent with a set of pairwise ordering constraints,
which arises in a number of applications. The problem
of topologically sorting large DAGs arises, for example,
in the application of recent multiple sequence alignment
algorithms [20,21] to large collections of DNA sequences.
Topologically sorting large DAGs is also an impor-
tant building block for other I/O-efficient algorithms,
mostly due to a technique called time-forward pro-
cessing [9], which has proven useful in obtaining I/O-
efficient solutions to a number of problems but requires
the vertices of the graph to be given in topologically
sorted order. Time-forward processing solves the fol-
lowing “graph evaluation” problem: given a DAG each
of whose vertices has a label φ(x), process its vertices in
topologically sorted order and, for each vertex x, com-
pute a new label ψ(x) from φ(x) and the ψ-labels of x’s
in-neighbours. A simple example of this type of prob-
lem is the evaluation of a Boolean circuit represented as
a DAG: φ(·) assigns a Boolean function to each vertex,
turning it into a logical gate; ψ(x) is the output of the
gate represented by vertex x, given the inputs it receives
from its in-neighbours. Since time-forward processing
requires the vertices of the DAG to be given in topo-
logically sorted order and no general I/O-efficient topo-
logical sorting algorithm is known to date, time-forward
processing has been applied only in situations where a
topological ordering of the vertices can be obtained by
using secondary information about the structure of the
DAG (e.g., [3, 4, 13, 15]). A general topological sorting
algorithm for massive graphs would greatly increase the
applicability of this technique.
Two simple linear-time algorithms for topological
sorting are to repeatedly number and remove sources
(in-degree-0 vertices) or to perform a depth-first search
(DFS) of the graph and number the vertices in reverse
Copyright © 2011 by SIAM
Unauthorized reproduction is prohibited.
139