Variable Granularity Access Tracking Scheme for
Improving the Performance of Software
Transactional Memory
Sandya S.Mannarswamy
CSA, IISc and Hewlett Packard
Bangalore,India
sandya@hp.com
Ramaswamy Govindarajan
SERC, Indian Institute of Science,
Bangalore, India
govind@serc.iisc.ernet.in
Abstract— Software transactional memory (STM) has been
proposed as a promising programming paradigm for shared
memory multi-threaded programs as an alternative to conventional
lock based synchronization primitives. Typical STM
implementations employ a conflict detection scheme, which works
with uniform access granularity, tracking shared data accesses
either at word/cache line or at object level. It is well known that a
single fixed access tracking granularity cannot meet the conflicting
goals of reducing false conflicts without impacting concurrency
adversely. A fine grained granularity while improving concurrency
can have an adverse impact on performance due to lock aliasing,
lock validation overheads, and additional cache pressure. On the
other hand, a coarse grained granularity can impact performance
due to reduced concurrency. Thus, in general, a fixed or uniform
granularity access tracking (UGAT) scheme is application-
unaware and rarely matches the access patterns of individual
application or parts of an application, leading to sub-optimal
performance for different parts of the application(s). In order to
mitigate the disadvantages associated with UGAT scheme, we
propose a Variable Granularity Access Tracking (VGAT) scheme
in this paper. We propose a compiler based approach wherein the
compiler uses inter-procedural whole program static analysis to
select the access tracking granularity for different shared data
structures of the application based on the application’s data
access pattern. We describe our prototype VGAT scheme, using
TL2 as our STM implementation. Our experimental results reveal
that VGAT-STM scheme can improve the application performance
of STAMP benchmarks from 1.87% to up to 21.2%.
Keywords-compiler, software transactional memory
1. Introduction
An atomic section is a programmer-specified
region of source code that executes atomically (other
concurrent code sees either none or all of the updates it
makes to program state) and in isolation from other
concurrent code. Replacing locks by atomic sections
relieves the programmer of the cumbersome task of
identifying particular locks to protect particular data
structures. The atomic section is an abstraction, likely to be
a programming language construct. Atomic sections
simplify the task of writing concurrent software since
programmers can specify the code region which needs to
execute atomically by simply enclosing the code region with
the keyword ‘atomic’.
One way of supporting atomic sections is through
transactional memory [1, 2, 20, and 21]. Transactional
Memory (TM) can be implemented either in hardware or in
software or a combination of the two. In order to be widely
adopted, a TM system must support transactions of
unbounded size and duration, and allow transactions to be
integrated with a language environment [12]. Since
Software Transactional Memory (STM) helps to achieve
these objectives, there has been considerable interest in
developing high performance implementations of STMs.
STM implementations can be broadly classified as: lock-
based and obstruction-free. Lock-based STMs typically
employ a variant of the two-phase locking protocol [30].
Obstruction-free STMs [2] do not use any blocking
synchronization mechanisms (such as locks), and guarantee
progress even when some of the transactions are delayed.
Lock based STM implementations [4, 17, 18, 29] have been
shown to have lesser validation overhead and hence exhibit
better performance than non-blocking ones. Therefore we
focus our attention to lock based STMs in this paper.
STMs allow for optimistic execution by permitting
multiple atomic sections to run concurrently assuming they
will not conflict. However, in case a conflict does occur,
STMs employ a mechanism to detect and recover from such
conflicts. Most STMs employ the single-writer-multiple-
readers strategy. Two concurrent transactions conflict when
they access the same location and at least one of the
accesses is a write (update). In order to commit, a
transaction must eventually acquire write locks for every
memory location that is written by it. Locks can be acquired
eagerly, i.e., at the time of the first update operation by the
transaction on the memory location, or lazily, i.e., when the
transaction is about to commit. Reads to shared data can
either be visible or invisible [20] to other transactions
accessing the same data. In an STM which supports
invisible reads, a transaction reading a shared datum x needs
to detect any possible conflicts on x with other transactions
that write x concurrently, i.e., validating its read set.
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.51
455
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.51
455
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.51
455
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.51
455
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.51
455