Microelectron. Reliab., Vol. 32, No. 1/2, pp. 49-65, 1992. 0026-2714/92/$5.00 + ,00
Printed in Great Britain. © 1991 Pergamon Press plc
A UNIFIED PERFORMANCE RELIABILITY ANALYSIS OF A
SYSTEM WITH A CUMULATIVE DOWN TIME CONSTRAINT
VICTOR NICOLA
IBM Thomas J. Watson Research Center,
P.O. Box 704, Yorktown Heights, NY 10598, U.S.A.
ANDREA BOBBIO*
Istituto Elettrotecnico Nazionale Galileo Ferraris,
Strada delle Cacce 91, 10135 Torino, Italy
and
KISHOR TRIVEDIt
Department of Computer Science,
Duke University, Durham, NC 27706, U.S.A.
(Received for publication 6 November 1990)
Abstract
W e discuss unifiedperformance and reliability analysisof a system which operates in
a critical environment, in the sense that a catastrophic condition is reached when the
accumulated down time exceeds a given threshold. Assuming that the system must process
a task with a specifiedwork requirement, we evaluate the probabilitythat the task willbe
completed st a given time before the system reaches the catastrophicstate.
We show that severalother important measures (like the distribution of the lifetime, the
distributionof the intervalavailability, and the instantaneousavailability) can be derived
from the knowledge of the distribution of the completion time. A numerical example, based
on the use of Phase (PH) type distributedrandom variables, concludes the paper.
1 Introduction
We consider unified performance and reliabifity analysis [16, 15] of a system that alternates
between an up state and a down state, and that reaches a catastrophic condition when the
accumulated downtime exceeds a critical threshold (that can be either a constant or a random
variable). A task, that requires a specified amount of work, is processed by the system and
we evaluate the probability that at a given time the task will be completed before the system
reaches the critical state. Upon occurrence of a failure the task is preempted, and we consider
two possible situations [10, 16]:
• the work done on the task until the time of failure is saved and the task is resumed when
the system is repaired (preemptive-resume failure);
• the work done on the task until the time of failure is lost, and the task must be restarted
when the system is repaired (preemptive-repeat failure).
"This work was partially supported by the Italian National Research Council CNR under the Project "Ma-
terials and Devices for Solid State Electronics~, Grant 88.01657.61
|ThiB research was sponsored in part by the SDIO Innovative Science and Technology OtBee and raan~ged
by the Ot~ce of Naval Research under contract N3014-88-K-0623
49