Variable Granularity Access Tracking Scheme for Improving the Performance of Software Transactional Memory Sandya S.Mannarswamy CSA, IISc and Hewlett Packard Bangalore,India sandya@hp.com Ramaswamy Govindarajan SERC, Indian Institute of Science, Bangalore, India govind@serc.iisc.ernet.in Abstract— Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application- unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application’s data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%. Keywords-compiler, software transactional memory 1. Introduction An atomic section is a programmer-specified region of source code that executes atomically (other concurrent code sees either none or all of the updates it makes to program state) and in isolation from other concurrent code. Replacing locks by atomic sections relieves the programmer of the cumbersome task of identifying particular locks to protect particular data structures. The atomic section is an abstraction, likely to be a programming language construct. Atomic sections simplify the task of writing concurrent software since programmers can specify the code region which needs to execute atomically by simply enclosing the code region with the keyword ‘atomic’. One way of supporting atomic sections is through transactional memory [1, 2, 20, and 21]. Transactional Memory (TM) can be implemented either in hardware or in software or a combination of the two. In order to be widely adopted, a TM system must support transactions of unbounded size and duration, and allow transactions to be integrated with a language environment [12]. Since Software Transactional Memory (STM) helps to achieve these objectives, there has been considerable interest in developing high performance implementations of STMs. STM implementations can be broadly classified as: lock- based and obstruction-free. Lock-based STMs typically employ a variant of the two-phase locking protocol [30]. Obstruction-free STMs [2] do not use any blocking synchronization mechanisms (such as locks), and guarantee progress even when some of the transactions are delayed. Lock based STM implementations [4, 17, 18, 29] have been shown to have lesser validation overhead and hence exhibit better performance than non-blocking ones. Therefore we focus our attention to lock based STMs in this paper. STMs allow for optimistic execution by permitting multiple atomic sections to run concurrently assuming they will not conflict. However, in case a conflict does occur, STMs employ a mechanism to detect and recover from such conflicts. Most STMs employ the single-writer-multiple- readers strategy. Two concurrent transactions conflict when they access the same location and at least one of the accesses is a write (update). In order to commit, a transaction must eventually acquire write locks for every memory location that is written by it. Locks can be acquired eagerly, i.e., at the time of the first update operation by the transaction on the memory location, or lazily, i.e., when the transaction is about to commit. Reads to shared data can either be visible or invisible [20] to other transactions accessing the same data. In an STM which supports invisible reads, a transaction reading a shared datum x needs to detect any possible conflicts on x with other transactions that write x concurrently, i.e., validating its read set. 2011 IEEE International Parallel & Distributed Processing Symposium 1530-2075/11 $26.00 © 2011 IEEE DOI 10.1109/IPDPS.2011.51 455 2011 IEEE International Parallel & Distributed Processing Symposium 1530-2075/11 $26.00 © 2011 IEEE DOI 10.1109/IPDPS.2011.51 455 2011 IEEE International Parallel & Distributed Processing Symposium 1530-2075/11 $26.00 © 2011 IEEE DOI 10.1109/IPDPS.2011.51 455 2011 IEEE International Parallel & Distributed Processing Symposium 1530-2075/11 $26.00 © 2011 IEEE DOI 10.1109/IPDPS.2011.51 455 2011 IEEE International Parallel & Distributed Processing Symposium 1530-2075/11 $26.00 © 2011 IEEE DOI 10.1109/IPDPS.2011.51 455