Testing acyclicity and topological sorting in external-memory Andrew R. Curtis * Bryan L. Shader † October 2, 2004 1 Introduction There has been a considerable amount of work recently to develop external-memory algorithms for fundamental graph algorithms. This is due the increasing size of data sets for many modern applications. In external-memory algorithm design, memory is managed as part of the algorithm to minimize the number of input/outputs (I/Os) performed between internal memory and secondary storage. The number of operations performed by the CPU is generally ignored in external algorithm analysis, since a typical transfer of data from disk is about one million times slower than from internal memory. For a recent surveys, see [11, 12]. In this paper, we present the first I/O efficient algorithm to determine if a general directed graph is acyclic. In addition to checking acyclicity our algorithm returns a topological ordering of the vertices. I/O efficient topological sorting has been a long-standing open problem, and our results help to further progress work on this problem. Our acyclicity testing algorithm has an I/O complexity of O((V + E B )log 2 V B + sort(E)). Note that this bound matches the best known bound for topological sorting a general digraph using DFS. While our results don’t beat the previously known algorithm of [7], we provide further insight into the external-memory topological sorting problem. 1.1 I/O Model We use the standard I/O model of Aggarwal and Vitter[1], resulting in the following notation: • M = internal memory size (the number of data items that will fit in memory) • B = block transfer size (the number of data items transfered in a single I/O) with the assumptions that M is less than the input size N and 1 <B ≤ M 2 . The two fundamental I/O operations are (1) scanning over a file of N contiguous data items and (2) sorting N contiguous data items. Reading a file containing N data items thus requires scan(N )= N B I/Os, and sorting a file of N data items requires sort(N ) = Θ( N B log M/B N B ) I/Os. Generally, scan(N ) < sort(N ) << N . For a graph G =(V,E), denote the number of vertices as V and the number of arcs as E. The meaning of this notation will be clear by the context. * andyc@uwyo.edu, Mathematics Department, University of Wyoming, Laramie, WY, 82070. Funded by the Wyoming NASA Space Grant Consortium, NGT-40102 and NCC5-578 † bshader@uwyo.edu, Mathematics Department, University of Wyoming, Laramie, WY, 82070. 1