Embedding Semantics of the Single-Producer/Single-Consumer Lock-Free Queue into a Race Detection Tool Manuel F. Dolz, David del Rio Astorga, Javier Fernández, J. Daniel García, Félix García-Carballeira Department of Computer Science University Carlos III of Madrid, 28911–Leganés, Spain drio@pa.uc3m.es {mdolz,jfmunoz,jdgarcia,fgcarbal}@inf.uc3m.es Marco Danelutto, Massimo Torquati Department of Computer Science University of Pisa, 56127–Pisa, Italy {marcod,torquati}@di.unipi.it ABSTRACT The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications not yet be fully suited for getting the maximum performance. The advent of parallel programming frameworks offering structured pat- terns has alleviated developers’ burden adapting such ap- plications to parallel platforms. For example, the use of synchronization mechanisms in multithreaded applications is essential on shared-cache multi-core architectures. How- ever, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels may influence the occurrence of data races. The benefits of race detectors are formidable in this sense, nevertheless if lock-free data struc- tures with no high-level atomics are used, they may emit false positives. In this paper, we extend the ThreadSani- tizer race detection tool in order to support semantics of the general Single-Producer/Single-Consumer (SPSC) lock- free parallel queue and to detect benign data races where it was correctly used. To perform our analysis, we leverage the FastFlow SPSC bounded lock-free queue implementa- tion to test our extensions over a set of μ-benchmarks and real applications on a dual-socket Intel Xeon CPU E5-2695 platform. We demonstrate that this approach can reduce, on average, 30% the number of data race warning messages. Categories and Subject Descriptors H.4 [Multi-/Many-core processors]: Parallel program- ming structures; D.1.3 [Lock-free programming]: Race detectors Keywords Parallel programming; Wait-/lock-free parallel structures; Data race detectors; Semantics Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. PMAM16 PMAM 2016 The Seventh International Workshop on Program- ming Models and Applications for Multicores and Manycores c 2016 ACM. ISBN 978-1-4503-4196-7/16/03. . . $15.00 DOI: http://dx.doi.org/10.1145/2883404.2883406 1. INTRODUCTION As we pave the way towards Exascale computing, the use of multi- and many-core architectures, with possibly one or more co-processors/accelerators, working together to effi- ciently solve scientific problems becomes a complex challenge that the HPC community needs to face [7, 5]. The adop- tion of parallel programming frameworks executing multiple processes and/or threads simultaneously, drops developer’s burden to design and implement efficient parallel applica- tions from scratch. Despite of this, much of the current software is not yet fully accommodated to run on recent par- allel platforms. In most cases, hardware design progresses faster than the parallelization and optimization processes of existing software. To deal with this issue, the use of build- ing blocks implementing core functionalities has been a well accepted approach in the HPC area [9]. Indeed, many of scientific parallel applications leverage efficient parallel ker- nels from highly-tuned libraries at the bottom of their food- chain. However, these kernels must guarantee correctness and thread-safety in order to generate correct global results. While parallel programming techniques have been broadly adopted in implementing large scientific applications, con- currency bugs, especially data races, have become more fre- quent. The adversity in finding data races and deadlocks is a well known problem [8]. Detecting catastrophic errors has been recognized as an arduous task, given that errors may occur only during low-probability sequences of events and may also depend by external factors such as the current ma- chine load. These facts, make data races extremely sensitive in terms of time, the presence of print statements, compiler options, or differences in memory models. Data races are especially difficult to observe, since often they quietly vio- late data structure invariants rather than cause immediate crashes. Although data race detectors alleviate debugger’s task in finding these issues, they are still not perfect [8, 4]. In particular, non-blocking or lock-free structures in which no high-level atomic instructions are used, can still generate false positives, thus blurring developer’s vision in finding real data races and making the debugging process harder when tracing back the main cause of the problem. The contribution of this paper focuses on embedding se- mantics of the general Single-Producer/Single-Consumer (SPSC) bounded lock-free queue into ThreadSanitizer, a well-known race detector part of the LLVM infrastructure. The formalization and implementation process of the seman-