FABRICATION-TIME AND RUN-TIME FAULT-TOLERANT
ARRAY PROCESSORS USING SINGLE-TRACK SWITCHES
S.Y. Kung*, S.N. Jean, and C.W. Chang
*Princeton University
Department of Electrical Engineering
Princeton, NJ 08544
University of Southern California
Department of Electrical Engineering
Los Angeles, CA 90089
Abstract
This paper addresses the important fault-tolerance issue for arrays of large number of
processors. An array grid model based on single-track switches is adopted. Single track
requires less hardware overhead and suffers less from possible faults on switches. More
significantly, we are able to establish a very useful necessary and sufficient condition for
the reconfigurability of the array. This is indeed the theoretical footing for two reconfigu-
ration algorithms: one adopts global control for the (fabrication-time) yield enhancement
and the other is a distributed scheme for the (run-time) reliability improvement. For
the fabrication time reconfiguration algorithm, the task can be reformulated as a max-
imum independent set problem. An existing algorithm in graph theory is adopted to
effectively solve this problem. The simulations conducted indicate that the algorithm
is computationally very efficient; therefore, it is also very suitable for the compile-time
fault-tolerance. In contrast, for the real time reconfiguration algorithm, it is more suit-
able to adopt a distributive method for (asynchronous) array processors. The algorithm
has several important features: (1) it is distributively executed by the processor elements
(PEs); (2) no global information is required by the individual PEs; (3) the time overhead
for reconfiguration is independent of the array size; (4) transient faults are handled by
retries or by deactivating/reactivating the temporarily failed PE. Based on simulations,
the performance of the algorithms and the tradeoffs between fault-tolerance capability and
hardware complexity for various kinds of spare PE distributions are evaluated.
1 Introduction
Two popular array processors are systolic and wavefront arrays. They feature the important
properties of modularity, regularity, local interconnection, and a high degree of pipelining.
They are very suitable for most signal and image processing algorithms. According to Kung
and Leiserson [6], "A systolic system is a network of processors which rhythmically compute
and pass data through the system". The wavefront array does not employ global synchro-
nization; instead, each PE has its own local clock (self-timed) and exchanges data with
neighboring PEs by asynchronous handshaking. Thus the requirement on correct timing in
the systolic array is now substituted by the correct sequencing in the wavefront array, opening
up new ftexibilities in fault tolerant designs [9].
281
I. Koren (ed.), Defect and Fault Tolerance in VLSI Systems
© Plenum Press, New York 1989