A MATHEMATICAL TOOL FOR SUPPORT OF FAULT-TOLERANT EMBEDDED SYSTEMS DESIGN S. Frenkel, A. Pechinkin, V. Chaplygin, I.Levin 1 Institute of Informatics Problems RAS, Moscow, Russia, 119333, Vavilova 44, kor.2, Moscow, Russia Tel-Aviv University, Ramat-Aviv, 69978, Israel, E-mails: (slf-ipiran@mtu-net.ru , Apechinkin@ipiran.ru , ilia1@post.tau.ac.il ) Abstract Designers of fault-tolerant computer systems need methodological and software framework which would support their efforts in analysis and optimization of new design solutions, based on new and forthcoming hardware and software technologies, embedded systems, in particular These new and advanced technologies - high-performance and self-reconfigurable systems, nanotechnologies- lead to unprecedented challenges. For example, often as a result of transient faults, reconfigurations in FPGA-based high-performance systems become unsafe. Therefore, designers have to make decisions concerning the systems' reliability at the various design levels, and the performance and safety abilities of the systems as well. Some timing characteristics of possible failure detection and recovery may be very important in the decision making process. In this paper a concept of principal pieces of such framework will be considered. Namely, possible tools matching the current state-of-the art of mathematical modeling of self-recovering features of the fault-tolerant Smart systems are considered. KEYWORDS embedded system modeling, fault-tolerant computing, self-checking, fault latency, finite- state machines, Markov chains 1. Introduction Computing systems for many applications must be fault-tolerant to be able to continue operating despite limited failures of portions of their hardware or software [4]. The fault-tolerant properties are provided by using of various types of redundancy (time, hardware, information) which may, in particular, provide fault-tolerance via achieving the phenomena self-healing, self-stabilization, self- reconfiguration [1,2,4]. Self-healing refers to the system’s ability to detect failures in any of its components or interaction protocols to correct them so that the work is not interrupted. The mechanism of self- healing enables the system to continue operating properly on the event of the failure of some of its components, to determine the errors and to recover from them. For example, a concept of a partially monotonic Finite State Machine (FSM), where the transitions are computed by partially monotonic Boolean functions is used to provide self-healing properties. In particular, if we consider a self-checking digital circuit design, the different properties of logical functions may provide self-healing properties of the circuit [1]. The self-healing notion can be very interested for reliability modeling of embedded systems in presence of transient faults . The architecture that supports the self-healing property of the FSM is a well-known self- checking architecture [1], that uses output self- checking checker. In along with self-healing, self-stabilization also belongs to important fault-tolerant computing aspects. A system is said to be self-stabilizing, if starting from any state, it is guaranteed that the system will eventually reach a correct state (convergence). Given that the system is in a correct state, it is guaranteed that it will be stayed in a correct state, provided that no further fault happens (closure). For example, a distributed algorithm A is self-stabilizing if, whatever the initial configuration it starts from, it reaches within a finite time a set L of “legal” configurations, i.e, configurations satisfying a desired property [5]. Mostly important numerical characteristics of both these phenomena are the time for the recovery from arbitrary state disruptions. For example, if we explore a self-stabilizing algorithm A, it may be the expected time of reaching a set L of the legal configurations [5], which are usually a subset of states or processes, proper from some viewpoint of a target system design [6]. For the FSM, it can be the average number of