International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 6 - Dec 2013 ISSN: 2231-5381 http://www.ijettjournal.org Page 302 An Approach to Build Software Based on Fault Tolerance Computing Using Uncertainty Factor Mrityunjay Brahma Department of Computer Science, MIPS, MITS Rayagada, Odisha 765017, India Abstract— In this work, we have started with an overview on fault tolerance based system. In case of design diversity based software fault tolerance system, we observed that uncertainty remains an important factor. Keeping this factor, we have discussed about implementing Bayes’ theorem and probabilistic mathematical model to handle the uncertainty factor. We assume that, once developed, the complete model will give us better efficiency. The rest of this paper deals with other types of fault tolerance systems and their approaches. This part is a kind of literature review, which includes, fault tolerant computing schemes that rely on the single-design as well as on the multiple- design. Further, in single-design, we have discussed about recovery block, N-version programming, N self-checking programming scheme. Lastly, focusing on multiple-design, we have discussed about software engineering aspects, error detection mechanisms and fault tolerance by fault injection. The paper ends with a general conclusion. Keywords— Fault tolerance, Software fault tolerance, Bayes’ Theorem, Uncertainty. I. INTRODUCTION Most fault-tolerant computer systems are designed to be able to handle several possible failures, including hardware- related faults such as hard disk failures, input or output device failures, or other temporary or permanent failures; software bugs and errors; interface errors between the hardware and software, including driver failures; operator errors, such as erroneous keystrokes, bad command sequences, or installing unexpected software; and physical damage or other flaws introduced to the system from an outside source. Hardware fault-tolerance is the most common application of these systems, designed to prevent failures due to hardware components. Typically, components have multiple backups and are separated into smaller "segments" that act to contain a fault, and extra redundancy is built into all physical connectors, power supplies, fans, etc. There are special software and instrumentation packages designed to detect failures, such as fault masking, which is a way to ignore faults by seamlessly preparing a backup component to execute something as soon as the instruction is sent, using a sort of voting protocol where if the main and backups don't give the same results, the flawed output is ignored. Research into the kinds of tolerances needed for critical systems involves a large amount of interdisciplinary work. The more complex the system, the more carefully all possible interactions have to be considered and prepared for. Considering the importance of high-value systems in transport, public utilities and the military, the field of topics that touch on research is very wide: it can include such obvious subjects as software modeling and reliability, or hardware design, to arcane elements such as stochastic models, graph theory, formal or exclusionary logic, parallel processing, remote data transmission, and more. But without constant collaboration and data/instruction supply, no system can work. Uninterrupted information supply plays the most vital role in successful project management systems. For the case of project, several devices are attached together such as Servers, data storage facility, client machines, networking devices and so on, all supported by different software. These combinations of systems and software works round the clock in basic of 24x7x365 days [3],[5],[6]. Fault tolerance is a technique so that a system perform its function correctly even in the presence of internal faults. The purpose of fault tolerance is to increase the dependability of a system. A failure occurs when a system deviates from the specified behavior. This type of failure is called an error. Fault tolerance techniques are used to tolerate fault by redundancy [4],[7]. Software faults are commonly called “bugs”. Software fault tolerance techniques are designed to allow a system to tolerate software faults that remain in the system after its development. It provides protection against errors in translating the requirements and algorithms into a programming language, but do not provide explicit protection against errors in specifying the requirements. These techniques have been used in the aerospace, nuclear power, healthcare, telecommunications and ground transportation industries, among others [4],[11]. II. DESIGNING APPROACHES FOR SOFTWARE BASED FAULT TOLERANCE Various software based fault tolerant approaches that are generally rely on design diversity (multiple version) as well as on single design. In the following sections, we have discussed about them in a bit elaborative way [4].