Hardware Signature Designs to Deal with Asymmetry in Transactional Data Sets Ricardo Quislant, Eladio Gutierrez, Oscar Plata, and Emilio L. Zapata Abstract—Transactional Memory (TM) systems must track memory accesses made by concurrent transactions in order to detect conflicts. Many TM implementations use signatures for this purpose, which summarize reads and writes in fixed-size bit registers at the cost of false positives (detection of nonexisting conflicts). Signatures are commonly implemented as two separate same-sized Bloom filters, one for reads and other for writes. In contrast, transactions frequently exhibit read and write sets of uneven cardinality. This mismatch between data sets and filter storage introduces inefficiencies in the use of signatures that have some impact on performance. This paper presents different signature designs as alternatives to the common scheme to deal with the asymmetry in transactional data sets in an effective way. Basically, we analyze two classes of new signatures, called multiset and reconfigurable asymmetric signatures. The first class uses only one Bloom filter to track both read and write sets, while the second class uses Bloom filters of configurable size for reads and writes. The main focus of this paper is a thorough study of these alternative signature designs, including a statistical analysis of false positives and an experimental evaluation, providing performance results and hardware area, time and energy requirements. Index Terms—Hardware transactional memory, Bloom filter, signatures, conflict detection, locality, multiset, asymmetric Ç 1 INTRODUCTION A T the beginning of the past decade, chip manufacturers started to turn to single-chip parallel processors (CMPs) [1], due to power, memory and ILP constraints of single-core architectures. CMPs include multiple processor cores with a shared-memory internal architecture. Today, multicore processors have become mainstream, and have quickly made multithreaded parallel programming widespread. In general, multithreaded programming is a challenging task which makes it difficult to exploit multicore processors. Parallelism introduces nondeterminism that must be con- trolled by a careful design of the computational threads and their coordination through explicit synchronization. Thus, mutual exclusion mechanisms must ensure correct con- current access to shared data. Low-level primitives like locks have been traditionally used for that purpose. However, as locks serialize multithreaded execution, most expert programmers resort to fine-grained locking to improve performance. This adds complexity to parallel programming and requires great efforts to achieve both high performance and deadlock avoidance. Also, locks have other disadvantages difficult to solve, like convoying or priority inversion [2], as well as ineffective mechanisms for abstraction and composition. Nonexpert parallel programmers, though, seek produc- tivity and performance at low programming complexity, which has caused a great interest in proposing alternative models to lock-based multithreaded programming. Transactional Memory (TM) [2], [3] represents an alternative that inherits the concept of transaction from the database field, with the aim of easing the writing of concurrent programs. A transaction is a block of computations that appears to be executed in an atomic and isolated way. TM systems execute transactions in parallel committing non- conflicting ones. A conflict occurs when a memory location is concurrently accessed by two or more transactions, and at least one access is a write. TM systems can be classified into software (STM) and hardware (HTM) systems, as well as hybrid and hardware accelerated implementations. In this paper, the interest lies in hardware implementations of TM, which include those systems that provide most of the required TM mechanisms implemented in hardware at the core level [4], [5], [6], [7], [8], [9], [10], [11], as well as those systems that provide hardware support to speed up parts of STM systems [12], [13], [14]. The systems above must track all data read and written by each transaction in order to detect data races (conflicts) amongst them. Bloom filters [15] were proposed to summarize transactional accesses into two fixed-size bit registers, called signatures, at the cost of false positives (detection of nonexisting conflicts). Such two signatures store, respectively, memory addresses that are read (read set—RS) and written (write set—WS) inside transactions. Some TM proposals that include signatures are FlexTM [14], BulkSC [16], LogTM-SE [17], SigTM [18], STMlite [19] (software signatures), and DynTM [20]. Read and write signatures are usually implemented as two separate, same-sized Bloom filters. In contrast, transac- tions frequently exhibit read and write sets of uneven cardinality. In addition, both sets are not disjoint, as data can be read and also written. This mismatch between data 506 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 . The authors are with the Department of Computer Architecture, ETSI Informatica, Campus de Teatinos, Universidad de Malaga, Malaga 29071, Spain. E-mail: {quislant, eladio, oplata, zapata}@uma.es. Manuscript received 30 Nov. 2011; revised 18 Apr. 2012; accepted 19 Apr. 2012; published online 2 May 2012. Recommended for acceptance by M.E. Acacio. For information on obtaining reprints of this article, please send e-mail to: tpds@computer.org, and reference IEEECS Log Number TPDS-2011-11-0870. Digital Object Identifier no. 10.1109/TPDS.2012.138. 1045-9219/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society