A Scalable Lock-free Stack Algorithm Danny Hendler School of Computer Science Tel-Aviv University Tel Aviv, Israel 69978 hendlerd@post.tau.ac.il Nir Shavit Tel-Aviv University & Sun Microsystems Laboratories shanir@sun.com Lena Yerushalmi School of Computer Science Tel-Aviv University Tel Aviv, Israel 69978 lenay@post.tau.ac.il ABSTRACT The literature describes two high performance concurrent stack algorithms based on combining funnels and elimina- tion trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they per- form well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load in- creases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a sin- gle elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting elimination-backoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its sim- plicity and scalability make it a viable practical alterna- tive to existing constructions for implementing concurrent stacks. Categories and Subject Descriptors C.1.4.1 [Computer Systems Organization]: Processor Architectures—Parallel Architectures,Distributed Architec- tures ; E.1.4.1 [Data]: Data Structures—lists, stacks and queues General Terms Algorithms, theory, lock-freedom, scalability 1. INTRODUCTION Shared stacks are widely used in parallel applications and operating systems. As shown in [21], LIFO-based schedul- ing not only reduces excessive task creation, but also pre- vents threads from attempting to dequeue and execute a task which depends on the results of other tasks. A concurrent This work was supported in part by a grant from Sun Mi- crosystems. SPAA’04, June 27–30, 2004, Barcelona, Spain. Copyright 2004 Sun Microsystems, Inc. All rights reserved. ACM 1-58113-840-7/04/0006. shared stack is a data structure that supports the usual push and pop operations with linearizable LIFO semantics. Lin- earizability [11] guarantees that operations appear atomic and can be combined with other operations in a modular way. When threads running a parallel application on a shared memory machine access the shared stack object simultane- ously, a synchronization protocol must be used to ensure cor- rectness. It is well known that concurrent access to a single object by many threads can lead to a degradation in perfor- mance [1, 9]. Therefore, in addition to correctness, synchro- nization methods should offer efficiency in terms of scala- bility and robustness in the face of scheduling constraints. Scalability at high loads should not however come at the price of good performance in the more common low con- tention cases. Unfortunately, the two known methods for parallelizing shared stacks do not meet these criteria. The combining funnels of Shavit and Zemach [20] are linearizable [11] LIFO stacks that offer scalability through combining, but per- form poorly at low loads because of the combining over- head. They are also blocking and thus not robust in the face of scheduling constraints [12]. The elimination trees of Shavit and Touitou [17] are non-blocking and thus robust, but the stack they provide is not linearizable, and it too has large overheads that cause it to perform poorly at low loads. On the other hand, the results of Michael and Scott [15] show that the best known low load method, the simple linearizable lock-free stack of Treiber [22], scales poorly due to contention and an inherent sequential bottleneck. This paper presents the elimination backoff stack, a new concurrent stack algorithm that overcomes the combined drawbacks of all the above methods. The algorithm is lin- earizable and thus easy to modularly combine with other algorithms, it is lock-free and hence robust, it is parallel and hence scalable, and it utilizes its parallelization con- struct adaptively, which allows it to perform well at low loads. The elimination backoff stack is based on the fol- lowing simple observation: that a single elimination array [17], used as a backoff scheme for a lock-free stack [22], is both lock-free and linearizable. The introduction of elim- ination into the backoff process serves a dual purpose of adding parallelism and reducing contention, which, as our empirical results show, allows the elimination-backoff stack to outperform all algorithms in the literature at both high and low loads. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks. 206