On-Line Detection and Resolution of Communication Deadlocks* Wee K. Ng Chinya V. Ravishankar Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, MI 48109-2122 Abstract We present a new distributed algorithm that de- tects and resolves communication deadlocks on-line, i.e., simultaneously detects and resolves deadlock as communication requests are made, at no additional message trafic overhead, and with bounded delay be- tween the occurrence and detection of a deadlock. This is achieved via a novel technique for detecting knots, which sufice for the existence of communication dead- locks. Current distributed deadlock detection algo- rithms lack these features. Thus the algorithm is suit- able for sofl real-time systems and large distributed systems. We also prove that the algorithm detects communication deadlocks and that it is able to deal with false deadlocks. 1 Introduction Deadlocks have been categorized into two types in the literature [12, 19, 211. In the resource model (AND-model), a process that has multiple outstanding requests for resources suspends itself until all of them are serviced. Resources usually cannot be duplicated. In the communication model (OR-model), a process may proceed as soon as at least one of the outstand- ing requests is serviced. The process may thereupon discard the other requests. Some software resources are usually managed this way. Many distributed algorithms have been proposed to detect deadlocks in each of these categories [3, 4, 5, 7, 13, 16, 17, 18, 20, 211. There are four categories of deadlock algorithms [12]: path-pushing, edge-chasing, diffusing computations and global state detection. In this paper, we restrict our attention to edge-chasing (or probe-based) algorithms [5, 13, 16, 18, 20, 211. 'This work was supported in part by the Consortium for International Earth Science Information Networking. 524 These are elegant because they do not require the con- struction of a WFG (Waits-For Graph) as in path- pushing algorithms. Furthermore, probes are sim- ply special messages exchanged among processes, and communication deadlocks are caused by the send- ing/receiving of messages. However these algorithms exhibit the following shortcomings: 1. Message traffic overhead is high because probing messages are required to perform detection. Most algorithms are evaluated on the basis of the num- ber of messages exchanged to detect deadlocks and omit the messages exchanged when there are no deadlocks. The algorithms are more expensive than they seem. 2. Deadlock detection is usually initiated when a process has waited long enough [21] or when a higher priority process is blocked by a lower pri- ority process [3, 13, 181. These are arbitrary cri- teria. 3. Deadlock detection and resolution are performed separately, usually duplicating effort [19]. The duration of deadlocks wastes resources and in- creases response time to user requests. Unfortu- nately, deadlock resolution is sometimes neglected [I, 91 or is not handled properly [14, 171. 4. These inefficiencies and complications suggest that current algorithms are not scalable to large distributed algorithms. Message overhead adds unnecessary trafic to the network, and delayed deadlock detection and resolution wastes re- sources and reduces throughput. We propose a new algorithm that overcomes these shortcomings. Our algorithm performs deadlock de- tection and.resolution concurrently on-line at no ad- ditional message traffic overhead (see Section 5). The rest of the paper is organized as follows: The next sec- tion describes the algorithm. Section 3 shows that the algorithm is correct. Section 4 is an analysis of the Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, 1994 1060-3425/94 $3.00 0 1994 IEEE