FABRICATION-TIME AND RUN-TIME FAULT-TOLERANT ARRAY PROCESSORS USING SINGLE-TRACK SWITCHES S.Y. Kung*, S.N. Jean, and C.W. Chang *Princeton University Department of Electrical Engineering Princeton, NJ 08544 University of Southern California Department of Electrical Engineering Los Angeles, CA 90089 Abstract This paper addresses the important fault-tolerance issue for arrays of large number of processors. An array grid model based on single-track switches is adopted. Single track requires less hardware overhead and suffers less from possible faults on switches. More significantly, we are able to establish a very useful necessary and sufficient condition for the reconfigurability of the array. This is indeed the theoretical footing for two reconfigu- ration algorithms: one adopts global control for the (fabrication-time) yield enhancement and the other is a distributed scheme for the (run-time) reliability improvement. For the fabrication time reconfiguration algorithm, the task can be reformulated as a max- imum independent set problem. An existing algorithm in graph theory is adopted to effectively solve this problem. The simulations conducted indicate that the algorithm is computationally very efficient; therefore, it is also very suitable for the compile-time fault-tolerance. In contrast, for the real time reconfiguration algorithm, it is more suit- able to adopt a distributive method for (asynchronous) array processors. The algorithm has several important features: (1) it is distributively executed by the processor elements (PEs); (2) no global information is required by the individual PEs; (3) the time overhead for reconfiguration is independent of the array size; (4) transient faults are handled by retries or by deactivating/reactivating the temporarily failed PE. Based on simulations, the performance of the algorithms and the tradeoffs between fault-tolerance capability and hardware complexity for various kinds of spare PE distributions are evaluated. 1 Introduction Two popular array processors are systolic and wavefront arrays. They feature the important properties of modularity, regularity, local interconnection, and a high degree of pipelining. They are very suitable for most signal and image processing algorithms. According to Kung and Leiserson [6], "A systolic system is a network of processors which rhythmically compute and pass data through the system". The wavefront array does not employ global synchro- nization; instead, each PE has its own local clock (self-timed) and exchanges data with neighboring PEs by asynchronous handshaking. Thus the requirement on correct timing in the systolic array is now substituted by the correct sequencing in the wavefront array, opening up new ftexibilities in fault tolerant designs [9]. 281 I. Koren (ed.), Defect and Fault Tolerance in VLSI Systems © Plenum Press, New York 1989