Employing Transactional Memory and Helper Threads to Speedup Dijkstra’s Algorithm Konstantinos Nikas, Nikos Anastopoulos, Georgios Goumas and Nectarios Koziris National Technical University of Athens School of Electrical and Computer Engineering Computing Systems Laboratory Members of HiPEAC {knikas,anastop,goumas,nkoziris}@cslab.ece.ntua.gr Abstract—In this paper we work on the parallelization of the inherently serial Dijkstra’s algorithm on modern multicore plat- forms. Dijkstra’s algorithm is a greedy algorithm that computes Single Source Shortest Paths for graphs with non-negative edges and is based on the iterative extraction of nodes from a priority queue. This property limits the explicit parallelism of the algo- rithm and any attempt to utilize the remaining parallelism results in signiﬁcant slowdowns due to synchronization overheads. To deal with these problems, we employ the concept of Helper Threads (HT) to extract parallelism on a non-traditional fashion and Transactional Memory (TM) to efﬁciently orchestrate the concurrent threads’ accesses to shared data structures. Results demonstrate that the proposed implementation is able to achieve performance speedups (reaching up to 1.84 for 14 threads), indicating that the two paradigms could be efﬁciently combined. I. I NTRODUCTION Parallel programming is a very intricate, yet increasingly important, task as we have entered the multicore era and more cores are made available to the programmer. Although separate applications or independent tasks within a single application can be easily mapped on multicore platforms, the same is not true for applications that do not expose parallelism in a straightforward way. Dijkstra’s algorithm [1] is a challenging example of such an application that is difﬁcult to accelerate when executed in a multithreaded fashion. It is a fundamental algorithm applied to compute single source shortest paths (SSSP) for graphs with non-negative edges and is used in a variety of applications, like network routing or VLSI design. Dijkstra’s algorithm iteratively extracts one node from a min-priority queue and performs relaxations to this node’s neighbors. To preserve the semantics of the algorithm the extractions must be performed sequentially, a fact that greatly prohibits efﬁcient parallelization [2], [3]. Straightforward par- allelism can be sought in the relaxation of the neighbors, but this approach leads to signiﬁcant performance slowdowns, since the threads need to synchronize their concurrent access to shared data very frequently [4]. Its fundamentally serial nature has led researchers to seek performance through signiﬁcant modiﬁcations of the algorithm [3], [5], [6], [7]. However, in this work we adhere to the original version and attempt to improve its performance by utilizing the capabilities provided by modern multicore processors. To this direction, we need to face the two major issues inherent to the algorithm: limited explicit parallelism and excessive synchronization. Since Dijkstra’s algorithm does not favor the utilization of multiple symmetric threads in any standard parallelization scheme (e.g. data-parallel, task-parallel, pipeline), we elabo- rate on the concept of Helper Threads (HT) [8], [9] and test whether the incorporation of helper threads is a good strategy to provide performance speedups. The key idea is to employ a number of threads that will ofﬂoad operations from the main thread in a transparent way. To amortize the cost of excessive synchronization, we employ Transactional Memory (TM) [10], [11]. TM is a novel programming model for multicore architectures that allows concurrency control over multiple threads and is getting adopted by the industry, as it is demonstrated by Sun’s coming processor Rock [12] or Intel’s STM [13]. The programmer is offered the capability to envelop parts of the code within a transaction, indicating that some of the memory accesses in this code segment may be performed by other threads as well. The TM system monitors the transactions of the threads and if two or more perform conﬂicting memory accesses, it decides how to handle this conﬂict. The common case is to allow one thread to commit its transaction and restart the transaction(s) of the other conﬂicting thread(s). In the case of non-conﬂicting transactions, TM systems perform the appropriate accesses with (almost) no overhead. TM seems a promising approach which increases programmability while being capable of providing performance gains through the concept of optimistic parallelism. Therefore, if for a given problem the threads access the same memory location too rarely, then locking seems a pessimistic exaggeration, making TM a more appropriate approach. Lately, TM’s usage in the parallelization of speciﬁc algorithms has attracted scientiﬁc attention [14], [15], [16], as its potential on the speedup of real-world applications is still under investigation. The evaluation of our scheme demonstrates that the combination of the aforementioned approaches can provide speedups, while requiring only a few extensions to the original source code. The rest of the paper is organized as follows: Sec- tion II discusses the basics of Dijkstra’s algorithm. Section III presents our scheme while Section IV presents its evaluation. Related work is presented in Section V and Section VI