A survey on Fault Detection, Isolation and Recovery (FDIR) Module in Satellite Onboard Sofware Fatemeh SalarKaleji Satellite Research Institute Tehran, Iran f.salar@sri.ac.ir fatemeh.salaail.com Abstract- The complexity of the avionic systems in satellites is rising as space missions become increasingly more sophisticated. This complexity emphasizes the need for more dependable systems with minimal anomalies. As satellite manufactures seek to convert many hardware implemented functionalities into software, the On-Board Software (OBSW) is becoming a major component in every satellite. Noticeably, more tasks for Fault Detection, Isolation and Recovery (FDIR) are being implemented in software, where the need comes for a well-defned sofware architecture that supports a cost-effective implementation of the FDIR functions. FDIR was already explained as key functionality of the OBSW. Obviously not all failures are subject to onboard identifcation and not all failures are subject to onboard recovery. The FDIR concept to be worked out for the spacecraft during the engineering phase follows some basic requirements and principles, implements a certain failure hierarchy- specifying furthermore on which level the failure is to be fxed- and fnally it implements a consistent approach for the functionality transferring the spacecraf to Safe Mode and how to recover from there. Since a FDIR concept usually follows a hierarchical approach, in this paper we will indicate a FDIR and safeguarding hierarchy example in the paper. In this structure we will indicate the levels of failures which handled by unit internal, subsystem software, satellite system sofware, onboard computer hardware reconfguration unit and ground. Also we will explain the FDIR hierarchy in safe mode implementation in a bit more detail. In this paper we will consider FDIR technologies in the On-board software in a satellite. Today, there are several proposed methodologies and frameworks which try to solve this problem. We will analyze the functionalities in FDIR Module implemented in an OBSW Framework. Also we have a survey on the FDIR hierarchies and their relationship to the Packet Utilization Standard (PUS) Services. Keywords- Fault Detection, Isolation and Recovery (FDIR), Software FDIR, On-Board software (OBSW), Satellites, Frameworks, Packet Utilzation Standard (PUS), On-Board architectures. I. INTRODUCTION "Failure Detection, Isolation and Recovery", (FDIR), was already explained as key functionality of the On-Board 978-1-4673-6396-9/13/$3l.00 ©2013 IEEE 545 Aboulfazl Dayyani Satellite Research Institute Tehran, Iran Day aniail.com Sofware (OBSW). Obviously not all failures are subject to onboard identifcation and not all failures are subject to onboard recovery. The FDIR concept to be worked out for the spacecraf during the engineering phase follows some basic requirements and principles, implements a certain failure hierarchy - specifing furthermore on which level the failure is to be fxed - and fnally it implements a consistent approach for the fnctionality transferring the spacecraf to Safe Mode and how to recover fom there. A properly defned Safe Mode with fll satellite observability is essential for FDIR operations. The Safe Mode must also assure a proper balance of the satellite produced and consumed resources (mainly power) since the diagnosis of failures plus recovery in most cases will not be possible within one ground contact (in particular not for polar orbiting Earth observation satellites). In this paper we have a survey on FDIR in the following sections: II. FDIR REQUlREMENTS Typical requirements for FDIR design at the beginning the satellite system engineering phase request that: A clear hierarchy is to be defned which type of failure is to be identifed and managed on which level FDIR level. The satellite must be able to reach its Safe Mode autonomously. The Safe Mode, if triggered, shall not limit ground in any way with spacecraf observability and commandability. Ground may also be allowed to submit commands which are blocked for the OBSW or are not allowed in that sequence for the OBSW. Ground must be able to perform a detailed status analysis and failure event history analysis for unique failure identifcation. Ground may alter operational limits to avoid future Safe Modes - e.g. in cases of failures triggered by equipment degradation.