HARP: Adaptive Abort Recurrence Prediction
for Hardware Transactional Memory
Adri` a Armejach
∗†
Anurag Negi
‡
Adri´ an Cristal
∗§
Osman Unsal
∗
Per Stenstrom
‡
Tim Harris
⋄1
∗
Barcelona Supercomputing Center
‡
Chalmers University of Technology
⋄
Oracle Labs, Cambridge
†
Universitat Polit` ecnica de Catalunya
§
IIIA - CSIC - Spanish National Research Council
Abstract—Hardware Transactional Memory (HTM) exposes
parallelism by allowing possibly conflicting sections of code,
called transactions, to execute concurrently in multithreaded
applications. However, conflicts among concurrent transactions
result in wasted computation and expensive rollbacks. Under
high contention HTM protocol overheads can, in many cases,
amount to several times the useful work done. Blindly scheduling
transactions in the presence of contention is therefore clearly
suboptimal from a resource utilization standpoint, especially in
situations where several scheduling options exist.
This paper presents HARP (Hardware Abort Recurrence
Predictor), a hardware-only mechanism to avoid speculation
when it is likely to fail. Inspired by branch prediction strategies
and prior work on contention management and scheduling in
HTM, HARP uses past behavior of transactions and locality
in conflicting memory references to accurately predict conflicts.
The prediction mechanism adapts to variations in workload
characteristics and enables better utilization of computational
resources. We show that an HTM protocol that integrates
HARP exhibits reductions in both wasted execution time and
serialization overheads when compared to prior work, leading
to a significant increase in throughput (~30%) in both single-
application and multi-application scenarios.
I. I NTRODUCTION
The problem of extracting thread level parallelism through
speculative execution has received a lot of attention from
both industry and academia [13, 18]. In particular, Hardware
Transactional Memory (HTM) [14] offers performance com-
parable to fine-grained locks while, simultaneously, enhancing
programmer productivity by largely eliminating the burden of
managing access to shared data. Recent usability studies sup-
port this thesis [8, 19], suggesting that Transactional Memory
(TM) can be an important tool for building parallel applica-
tions. For these reasons, HTM is getting increasing attention
from the industry [9, 10, 11], and IBM has released their
first chip with built-in HTM support, the BlueGene/Q [23].
More recently, Intel has published ISA extensions (TSX) that
provide support for basic HTM and lock elision, with the
intention of supporting these in upcoming products [16].
An HTM system allows concurrent speculative execution
of blocks of code, called transactions, that may access and
update shared data. However, in the presence of data conflicts
transactions may abort, i.e., the results of speculative execution
are discarded. This results in wasted work, expensive rollbacks
of application state, and inefficient utilization of computational
resources. While conflicts due to concurrent accesses to shared
data cannot be completely eliminated, mechanisms to avoid
starting a transaction when it is likely to fail are necessary for
maximizing computational throughput. Moreover, in scenarios
where multiple scheduling options are available, having such
1
Work done while at Microsoft Research, Cambridge
mechanisms can expose additional parallelism and improve
resource utilization.
While single application performance is still important,
systems where multiple parallel applications coexist are ex-
pected to become increasingly common in the near future. The
performance of HTM in scenarios with abundant transactional
threads is still an open question, and solutions that provide
efficient utilization of computational resources and good per-
formance are required for TM to gain wide acceptance. In
the past, considerable work has been done on contention
management, but mostly in the field of Software TM (STM) [1,
12, 20]. These proposals typically react after aborts happen,
without trying to avoid future conflicts. Conversely, a few
HTM proposals exist that try to avoid execution of possibly
conflicting transactions [3, 5, 24]. However, these solutions
do not provide full hardware support and rely on expensive
and specialized software runtime routines and data structures.
Moreover, the efficacy of these proposals in scenarios with
multiple concurrently executing applications is unclear.
In this paper, we introduce Hardware Abort Recurrence
Predictor (HARP), a comprehensive hardware proposal that
identifies groups of transactions that are likely to be executed
concurrently without conflicts. Our proposal allows other
threads or applications to utilize computational resources when
the expected duration of contention is long, providing better
throughput when running several applications, and potentially
higher parallelism when several threads of the same applica-
tion are available for scheduling. Moreover, HARP dynam-
ically chooses a contention avoidance mechanism based on
expected duration of contention, in order to maximize resource
utilization, while minimizing the amount of wasted work
due to transaction aborts. HARP avoids software overheads
by using simple hardware structures to record transactional
characteristics. More specifically, we notice strong temporal
locality in contended addresses in transactional applications.
By detecting when conflicting locations change, we can iden-
tify when contention is likely to dissipate.
To evaluate HARP, we compare it against “Bloom Filter
Guided Transaction Scheduling” (BFGTS) [3], a state-of-
the-art transaction scheduling technique, and LogTM [17], a
well established HTM design. Our evaluation includes single-
application setups, comprising a scenario with the same num-
ber of threads as cores, and a scenario with more threads than
cores. We provide insights on when using more threads can
extract additional parallelism, and show that HARP outper-
forms LogTM and BFGTS on average by 109.7% and 30.5%
respectively. Moreover, we are the first to study the perfor-
mance implications of a transactional multi-application setup
where, again, our technique outperforms the other evaluated
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future
media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI 10.1109/HiPC.2013.6799100