Accidents always Come in Threes: A Case Study of Data-intensive Programs in Parallel Haskell P.W. Trinder, University of Glasgow Glasgow, Scotland K. Hammond, University of St Andrews St. Andrews, Scotland H-W. Loidl, S.L. Peyton Jones University of Glasgow Glasgow, Scotland J. Wu Centre for Transport Studies, University College London London, England Abstract Accidents happen: “An invisible car came out of nowhere, struck my vehicle and vanished.” “I pulled away from the side of the road, glanced at my mother-in-law, and headed for the embankment.” “As I approached the intersection a sign suddenly appeared in a place where no stop sign had ever appeared before.” Luckily, we don’t normally have to deal with problems as bizarre as these. One interesting application that does arise at the Centre for Transport Studies consists of matching police reports of several accidents so as to locate accident blackspots. The application provides an interesting, data-intensive, test-bed for the persistent functional language PFL. We report here on an approach aimed at improving the performance of this application using Glasgow Parallel Haskell. The accident application is one of several large parallel Haskell programs under development at Glasgow. Our objective is to achieve wall-clock speedups over the best sequential implementations, and we report modest wall- clock speedups for a demonstration program. From experience with these and other programs the group is developing a methodology for parallelising large functional programs. We have also developed strategies, a mechanism to separately specify a function’s algorithm and its dynamic behaviour. 1 Introduction It has often been claimed that pure functional languages are highly suitable for parallel programming, but as yet there is little hard performance data to support such a contention, especially for non-trivial applications. The dearth of such programs is partly due to the lack of robust parallel implementations, and partly due to the absence of tools and techniques that support parallelisation. We do now, however, have the publicly-available GUM runtime system for Haskell [21]. GUM stands for Graph- reduction for a Unified Machine-model, and has been ported to several parallel platforms, including the CM5 [6], Sun SPARCServer shared-memory multiprocessor and networks of Suns and Alphas. We also have the highly-tunable, and well-instrumented GranSim simulator that is based on the same compiler technology as GUM. This combination has allowed us to ensure reasonable and verifiable performance gains on a range of parallel platforms for relatively low implementation effort. This paper describes work in progress that aims to explore the problems involved in writing non-trivial parallel programs, especially ones which are data-intensive – an area of special interest to our group. To date we have written a small demonstration program that achieves modest wall-clock speedups. We have also written a program to identify traffic-accident blackspots from real data, and are in the process of parallelising this program. We hope eventually to demonstrate real absolute performance gains for this program over its optimised sequential equivalent. The traffic accident program is only one of several large Haskell programs that the group has written or parallelised, one of which (LOLITA, a natural language processing system [14]) approaches 100,000 lines of source. Our experiences Glasgow Functional Programming Workshop, 1996 1