863 IEEE TRANSACTIONS ON COMPUTERS. VOL. c-31, NO. 9, SEPTEMBER 1982 [13] J. B. Dennis, "Notes on computation structure," M.I.T., Cambridge, MA, 1969. [141 H. A. Sholl and T. L. Booth, "Software performance modeling using computation structures," IEEE Trans. Software Eng., pp. 414-420, Dec. 1977. [15] T. L. Booth, Sequential Machines and Automata Theory. New York: Martin C. Wei (S'69-A'73-S'75-M'77) was born in Canton, China. He received the B.S., M.S., and Ph.D. degrees in electrical engineering and com- puter science from the University of Connecticut, Storrs, in 1971, 1973, and 1978, respectively. From 1971 to 1973 he was a Research Assistant at the University of Connecticut. From 1973 to 1974 he was an Associate Engineer with the I.B.M. System Product Division, E. Fishkill, NY. From 1974 to 1978 he was an Instructor and Re- search Fellow at the University of Connecticut. Since 1978 he has been with the Customer Systems Laboratory, Bell Labo- ratories, Holmdel, NJ, where he is now a Technical Supervisor. His research interests are in the areas of computer architecture, distributed and multip- rocessing systems, computer networking, distributed operating systems, and software reliability. Dr. Wei is serving in the Communication Software Technology Committee of IEEE ComSoc. [16] [17] [18] Wiley, 1967, ch. VI. R. Graham, "Bounds on multiprocessor timing anomalies," SIAM J. Appl. Math., vol. 17, no. 2, pp. 416-429. E. Horowitz and S. Sahni, Fundamentals of Computer Algorithms. Woodland Hills, CA: Comput. Sci. Press, 1978, pp. 567-572. Special Issue on Supercomputers, IEEE Comput. Mag., Nov. 1980. Howard A. Sholl (M'55-M'68) was born in Northampton, MA, on October 14, 1938. He re- ceived the B.S. and M.S. degrees in electrical engi- neering from Worcester Polytechnic Institute, Worcester, MA in 1960 and 1963, respectively, _ l and the Ph.D. degree in computer science from the University of Connecticut, Storrs, in 1970. From 1961 to 1963 he was a Graduate Teaching Assistant at Worcester Polytechnic Institue. He worked as a Senior Engineer in computer and logic design for Sylvania Electric Company, Needham, MA, from 1963 to 1966. Since 1966 he has been with the Department of Electrical Engineering and Computer Science at the University of Connect- icut, where he is now Associate Professor. He was a Leverhulme Visiting Fellow at the University of Edinburgh in 1973-1974, and a Fulbright Senior Research Fellow at the Technical University of Munich in 1981-1982. His research interests are in the areas of digital system design and software en- gineering. Dr. Scholl is a member of Eta Kappa Nu, Sigma Xi, and Tau Beta Pi. A Fault-Tolerant Communication Architecture for Distributed Systems DHIRAJ K. PRADHAN, SENIOR MEMBER, IEEE, AND SUDHAKAR M. REDDY, MEMBER, IEEE Abstract-A communication architecture for distributed processors is presented here. This architecture is based on a new topolgy we have developed, one which interconnects n nodes by using rn links where the maximum internode distance is log,n, and where each node has, at most, 2r, I/O ports. It is also shown that this network is fault-tol- erant, being able to tolerate up to (r - 1) node failures. One of the particularly attractive features of this network is that it allows for simple routing as well as for easy distributed fault-diag- nosis. Algorithms are also developed here for the purpose of routing messages from node to node; these are useful both with and without the presence of faults in the network. A procedure is developed, too, whereby each fault-free node can diagnose the faulty nodes indepen- dently, without the use of any central observer. Index Terns-Distributed architecture, distributed fault diagnosis, fault-tolerant communication networks, graph connectivity, inter- connection network graph diameters, self-diagnosis, store and forward networks. INTRODUCTION THE increased emphasis on fault-tolerance and reliability Thas made distributed processor architecure particularly Manuscript received December 30,1980; revised December 1, 1981. This work was supported by U.S. Air Force Office of Scientific Research Grants 80-0217 and 78-3482. D. K. Pradhan is with the School of Engineering, Oakland University, Rochester, MN 55901. S. M. Reddy is with the Department of Electrical and Computer Engi- neering, University of Iowa, Iowa City, IA 52240. attractive. Such an occurrence has provided the impetus for the development of certain distributed fault-tolerant computers which have recently been surveyed by Rennels [ 17]. Since the processing, control, and database are distributed in these systems, they do enjoy certain natural advantages over the centralized systems from the reliability viewpoint. Simple, graceful degradation is thus allowed for, as well as the elimi- nation of certain critical components from the system. Fur- thermore, the potential exists for incorporating distributed diagnosis and recovery directly into the system. If the pro- cessors in the distributed system are provided with the capacity to test and reconfigure faulty processors, then it is possible to design diagnosis and recovery algorithms which are distrib- uted. An important component of a distributed system is the system topology. The system topology defines the interpro- cessor communication architecture. Also, there are well-de- fined relationships between the system topology and the message delay, the routing algorithms, fault-tolerance, and fault-diagnosis. Specifically, the message delay may be directly proportional to the internode distance [2]. The complexity of a routing algorithm may be determined by the regularity of the topology. A highly regular structure may allow simple routing; on the other hand, a highly irregular structure may require extensive hardware/software support. The fault-tol- 0018-9340/82/0900-0863$00.75 C 1982 IEEE