QuakeTM: Parallelizing a Complex Sequential Application Using Transactional Memory Vladimir Gajinov †∗ Ferad Zyulkyarov †∗ Osman S. Unsal † Adrian Cristal † Eduard Ayguade † Tim Harris ‡ Mateo Valero †∗ † Barcelona Supercomputing Center ∗ Universitat Politecnica de Catalunya ‡ Microsoft Research Cambridge vladimir.gajinov@bsc.es ferad.zyulkyarov@bsc.es osman.unsal@bsc.es adrian.cristal@bsc.es eduard.ayguade@bsc.es tharris@microsoft.com mateo.valero@bsc.es ABSTRACT “Is transactional memory useful?” is the question that cannot be answered until we provide substantial applications that can evaluate its capabilities. While existing TM applications can partially answer the above question, and are useful in the sense that they provide a first-order TM experimentation framework, they serve only as a proof of concept and fail to make a conclusive case for wide adoption by the general computing community. This paper presents QuakeTM, a multiplayer game server; a complex real life TM application that was parallelized from the sequential version with TM-specific considerations in mind. QuakeTM consists of 27,600 lines of code spread across 49 files and exhibits irregular parallelism for which a task parallel model fits well. We provide a coarse-grained TM implementation characterized with eight large transactional blocks as well as a fine-grained implementation which consists of 58 different critical sections and compare these two approaches. In spite of the fact that QuakeTM scales, we show that more effort is needed to decrease the overhead and the abort rate of current software transactional memory systems to achieve a good performance. We give insights into development challenges, suggest techniques to solve them and provide extensive analysis of the transactional behavior of QuakeTM, with an emphasis and discussion of the TM promise of making parallel programming easier. Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Programming – Parallel Programming. General Terms: Design, Experimentation, Performance. Keywords: Game Server, Transactional Memory 1. INTRODUCTION Recently, processor manufacturers have done a right-hand turn away from increasing single core frequency and complexity. Low returns from instruction level parallelism (ILP) and problems with power/heat density have led to the appearance of multi-core processors that leverage thread level parallelism (TLP). In this new era of multi-core architectures, the coordination of the work done by the multiple threads that cooperate in the parallel execution is one of the challenging issues both in terms of programming productivity and execution performance. Transactional memory (TM) is a technology that may help here, by aiming to provide the performance of fine-grained locking with the ease-of-programming of coarse-grained critical sections. In this paper we assess the extent to which this is true of current TM implementations, based on code descriptions and examples as well as through performance evaluation. As a case study we started from a sequential version of Quake, a complex multi-player game. Using OpenMP and software transactional memory (STM) we built QuakeTM, a parallel version which consists of 27,600 lines of code spread across 49 files. Developed in 10 man-months, QuakeTM exhibits irregular parallelism and long transactions contained within eight different atomic blocks with large read and write sets. Our intention was not to pursue performance per se, but to examine whether or not it is possible to achieve good results with a coarse-grained parallelization approach. This decision was driven by one of the hopes for TM, to make parallel programming easier by abstracting away the complexities of using fine-grained locking, while still achieving good scalability. When parallelizing an application from scratch using TM, this kind of coarse-grained approach is likely to be popular with programmers. Consequently, this approach needs to be tested on a highly complex application in order to see how well it works in practice. This paper makes following contributions: • We describe how we developed QuakeTM and discuss the challenges we encountered. • We show that our implementation scales reasonably well, despite the use of coarse-grained transactions. However, we show that this scalability is unable to compensate for the high overhead and abort rate of the software transactional memory system. • Further on, we have adapted the fine-grained TM implementation described in our previous work on Atomic Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICS’09, June 8–12, 2009, Yorktown Heights, New York, USA. Copyright 2009 ACM 978-1-60558-498-0/09/06...$5.00.