Performance Evaluation of Hierarchy Annotation and Credit Distribution Quiescence Mechanisms Ronald F. DeMara, Kenneth Drake, Abdel Ejnioui Department of Electrical and Computer Engineering University of Central Florida 4000 Central Florida Blvd Orlando, Florida 32816-2450 U.S.A AbstractThis paper evaluates the execution characteristics of two high-capability software-based approaches for detecting termination in distributed environments. The Tiered Algorithm relies on use of a global invariant that indicates equality between process production and consumption at each level of process nesting. The Credit Algorithm relies on the distribution of a unit value from the initial parent process that can only be reconstituted if the barrier is complete. While both strategies can detect termination correctly regardless of the execution ordering, avoid potential race conditions caused by unpredictable transit times, and support arbitrary run-time binding of logical processes to physical processors, they each exhibit different message count complexity, message bit complexity, controller overhead, and detection delay. These metrics were assessed under 100 randomly generated trials consisting of 101 to 703 tasks under a variety of task creation and termination ordering scenarios. The Tiered Algorithm exchanged an average 12.9% fewer synchronization messages and 24.1% fewer bits in total than the Credit Algorithm. Results indicate that while the Tiered Algorithm required 80% fewer controller operations, the Credit Algorithm contributed an average of 3.9 fewer operations affecting detection latency with less variability. I. INTRODUCTION Efficient detection of process termination is essential for throughput optimization in distributed computer architectures and networks. An ensemble of Processing Elements (PEs) is said to be synchronized, or to have reached a quiescent state, upon completion of each interval of concurrent activity [1-3]. Points at which synchronization occur are referred to as a barriers [4-10]. The quiescence detection problem [1, 3] or termination detection problem [2, 11-13] has been studied extensively at several levels of detail and sophistication. In conjunction with such studies, various detection algorithms have been proposed both as software [1-4, 11, 13-15] and hardware [7-10, 12, 16-18] approaches. Termination detection performance can significantly influence the overall throughput since idle PEs cannot proceed to subsequent operations in the current thread until completion of the barrier has been signaled. Even when PEs are reactivated to perform tasks from another thread in order to utilize these processing cycles, significant overhead can be incurred. In addition to execution overhead, the interchange of synchronization messages during the detection process can degrade the message transmission capacity for the underlying computations being performed [12]. In this paper, we evaluate the performance of two algorithms with provably correct operation under the most demanding conditions corresponding to Dijkstra’s diffusing computation model [17]. These algorithms can efficiently detect termination without requiring a-priori knowledge of the mapping of processes to physical processors, without underlying assumptions about the message transit time or out-of-sequence delivery, and without any global clock or time reference. As described below, the Credit Algorithm [1] relies on the distribution of a unit value from the root process that can only be reconstituted if the barrier is complete. On the other hand, the Tiered Algorithm [12, 14, 19] relies on an invariant of equality between process production and consumption at all levels of process nesting. While trivial examples can indicate the relative behaviors of these two algorithms for specific task creation scenarios, it is more useful to study their performance for a wide distribution of scenarios. For instance, the Credit Algorithm can returns less data to the controller PE in terms of the length of synchronization messages, however it may do so more frequently. Furthermore, the data returned by the Credit Algorithm can require more processing at the controller, thus increasing the controller’s workload to detect termination. Given the computing resources of the controller and its workload, the time spent in detecting termination may vary more or less in which case trivial cases do not indicate the extent of this variation. While