Reliability and Performance of Component Based Software Systems with
Restarts, Retries, Reboots and Repairs
Vibhu Saujanya Sharma
Dept. of Computer Science and Engineering,
Indian Institute of Technology Kanpur,
Kanpur, UP, INDIA 208016
vsharma@cse.iitk.ac.in
Kishor S. Trivedi
Dept. of Electrical and Computer Engineering,
Duke University,
Durham, NC 27708-0291, USA
kst@ee.duke.edu
Abstract
High reliability and performance are vital for soft-
ware systems handling diverse mission critical applica-
tions. Such software systems are usually component based
and may possess multiple levels of fault recovery. A number
of parameters, including the software architecture, behav-
ior of individual components, underlying hardware, and the
fault recovery measures, affect the behavior of such systems,
and there is a need for an approach to study them. In this
paper we present an integrated approach for modeling and
analysis of component based systems with multiple levels
of failures and fault recovery both at the software, as well
as the hardware level. The approach is useful to analyze
attributes such as overall reliability, performance, and ma-
chine availabilities for such systems, wherein failures may
happen at the software components, the operating system,
or at the hardware, and corresponding restarts, retries, re-
boots or repairs are used for mitigation. Our approach en-
compasses Markov chain, and queueing network modeling,
for estimating system reliability, machine availabilities and
performance. The approach is helpful for designing and
building better systems and also while improving existing
systems.
1 Introduction
Software systems these days are being used in diverse
fields and handle many mission and time critical jobs. It
is important for such systems to be highly reliable and re-
sponsive. As these systems are mostly component based,
important attributes like reliability and performance depend
on the characteristics of the individual components, the way
they interact with each other, and upon the underlying hard-
ware infrastructure on which the components are deployed.
Moreover, as failures can happen at the software compo-
nents as well as the hardware, the way in which these fail-
ures are resolved, also has a direct bearing on the overall
reliability and performance.
Failures at software components are usually resolved by
rebooting their respective machines, and restarting the sys-
tem. However this adversely affects the performance and
also makes the system unavailable. Recent empirical stud-
ies [2, 3] show that successfully restarting just the software
components (as opposed to rebooting the machines) is an
effective way to handle transient software failures and in-
crease system reliability, and simultaneously reduce the per-
formance overhead. Other levels of fault recovery can also
be present [27], and these affect the performance as well as
the reliability of the system.
As the overall behavior of such complex component
based systems depends on a number of different factors,
modeling and analyzing such systems for attributes like re-
liability and performance has become important to ensure
their efficient and sound operation. If such an analysis can
be performed early in the software life-cycle, it can facilitate
in making key decisions regarding the software design so
that the final product performs better. Similarly, this activ-
ity is equally important for existing systems to help improve
them. In general, questions such as these become pertinent,
while studying such systems:
• How does the system perform if one or more software
components are unreliable ?
• How does unreliable underlying hardware affect the
system, and where to improve ?
• How do multiple fault recovery measures such as
restarts, retries, reboots and repairs, affect the system
reliability and performance?
• What are the various tradeoffs that exist ?
Answering such questions, requires an approach that
takes into account the software architecture and deployment
17th International Symposium on Software Reliability Engineering (ISSRE'06)
0-7695-2684-5/06 $20.00 © 2006