IEICE TRANS. INF. & SYST., VOL.E86–D, NO.12 DECEMBER 2003 2623 PAPER Special Issue on Dependable Computing Analyzing the Impact of Data Errors in Safety-Critical Control Systems * ¨ Orjan ASKERDAL , Magnus G ¨ AFVERT †† , Martin HILLER ††† , and Neeraj SURI †††† , Nonmembers SUMMARY Computersareincreasinglyusedforimplement- ing control algorithms in safety-critical embedded applications, such as engine control, braking control and flight surface con- trol. Consequently, computer errors can have severe impact on the safety of such systems. Addressing the coupling of control performance with computer related errors, this paper develops a methodology for analyzing the impacts data errors have on control system dependability. The impactof a data error is mea- sured as the resulting control error. We use maximum bounds on this measure as the criterion for control system failure (i.e., if the control error exceeds a certain threshold, the system has failed). In this paper we a) develop suitable models of computer faults for analysis of control level effects and related analysis methods, and b) apply traditional control theory analysis methods for un- derstanding the impacts of data errors on system dependability. An automobile slip-control brake-system is used as an example showing the viability of our approach. key words: safety-critical systems, control systems, error mod- eling, error analysis 1. Introduction With the increasing use of computers for implement- ing control algorithms, control systems become more vulnerabletocomputerlevelfailures. Thus,thispaper focusesonunderstandingtheimpactofcomputerlevel dataerrorsonsystemdependability(Errorscausedby faults in the computer nodes are often classified as ei- ther data errors, or timing errors. A data error occurs whenthecomputernodedeliversdatathatisincorrect, and a timing error occurs when the computer delivers data at an incorrect point in time. In this paper, we focustheanalysistotheimpactofdataerrorsonsys- temdependability. Methodsforanalyzingtheeffectsof timingerrorsoncontrolsystemswerepresentedine.g., [5],[10].). Manuscript received March 31, 2003. Manuscript revised June 27, 2003. The author is with the Department of Computer Engi- neering, Chalmers University of Technology, Sweden. †† The author is with the Department of Automatic Con- trol, Lund Institute of Technology, Sweden. ††† The author is with the Volvo Technology Corporation, Sweden and Technische Universit¨at Darmstadt, Germany. †††† The author is with the Department of Computer Sci- ence, Technische Universit¨at Darmstadt, Germany. This paper is based on the work presented at IEEE 2002 Pacific Rim International Symposium on Dependable Computing. The research was funded by NUTEK (DICOS- MOS P11762-2) and EC NextTTA (IST-2001-32111). Dataerrorsandtheireffectsoncomputerfunction- ality is an intensively researched area, e.g., [8]. How- ever, recent results[1],[9] show that many data errors willhavealimitedimpactoncontrolperformance,i.e., controlsystemsoftenhaveaninherentresiliencyorin- ertia to data errors. These results were obtained ex- perimentally using fault-injection, e.g., [8], for system validation. However, this technique requires a proto- typeofthesystem(oratleastadetailedmodel),which generally is not available in the early design phases. Thus, the specific scope of this paper is to develop a systematic design stage analytical basis for estimating the impacts of different data errors on the control appli- cations. Weenvisionourapproachtobeusedinearly designphasesasadesignlevelguidetoadaptfaulttol- erance techniques to enhance control system depend- abilityasneeded. Error effect analysis is an extensively developed area, e.g., [8]. Analysis of effects on system stabil- ity of data errors caused by EMI bursts was investi- gatedin[6]. However,ascatastrophicfailuresinsafety- criticalsystemmayoccurbeforethesystemreachesin- stability, we base our definition of system failure on thresholds of the magnitude and duration of the con- trolerror,i.e.,thedifferencebetweenthereference(de- sired) value of a controlled physical process property andtheactualvalueofthisproperty(e.g.,inasystem controlling cabin temperature in a car, the control er- ror would be the difference between the temperature the driver has requested and the actual temperature measured by the sensors). The analysis of this paper is focused on finding errors posing to threat the system, i.e., data errors that result in large control errors. As an example of error impacts, Fig.1 shows the outputsignalofacontrolsystem. Atstart,theoutput value follows the reference value (desired output), i.e., the control error is 0. Then, at time s, a transient fault occurs affecting the output vale. Depending on which bit (or bits) that are corrupted, the magnitude of the error will differ, as indicated by the solid lines. Themaximumacceptablecontrolerror,definedbythe system designers, taking into account noise levels and other effects, is plotted as dotted lines in Fig.1. The plottotheleftdescribesadataerrorinbit(s)withlow significance which never exceeds this level, and thus, doesnotneedtobehandled,whereasintherightplot,