Microelectronics Journal 93 (2019) 104620 Contents lists available at ScienceDirect Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo Self-healing hardware systems: A review Kasem Khalil a , , Omar Eldash a , Ashok Kumar a , Magdy Bayoumi a, b a The Center for Advanced Computer Studies, University of Louisiana at Lafayette, LA, USA b Department of Electrical and Computer Engineering, University of Louisiana at Lafayette, LA, USA ARTICLE INFO Index terms: Self-healing Embryonic hardware Evolvable hardware Fault tolerance DMR TMR Intelligent hardware system ABSTRACT Self-healing is increasingly becoming a promising approach to designing reliable digital systems, and it refers to the ability of a system to detect faults or failures and fix them through healing or repairing. Digital systems with architecture for self-healing are expected to compensate faults. However, there are a few research challenges that need to be overcome before self-healing becomes a mainstream approach. For example, current self-healing techniques face challenges such as scalability, reliability, area overhead, and mapping. This paper explains the self-healing concept and investigates the self-healing approaches related to digital design in the literature. It gives a general overview of the topic and explains levels of abstraction at which self-healing can be used: hardware level, application level, and system level. The paper presents multiple related works at each level of abstraction. In the faulted phase, different types of faults and fault detection methods are described. For the evaluation of a self-healing technique, this paper presents the parameters which can be used to evaluate a self-healing method. These parameters are redundancy rate, the maximum ratio of repair, self-healing time consumption, reliability, and area overhead. The paper also presents a comparison between previous works of self-healing in terms of evaluation techniques. Implementations using VHDL and ISE Xilinx Vertex-5 for self-healing on Embryonic Hardware (EmHW) and Network-on-Chip (NoC) are also presented. The simulation results show the contribution of self-healing to improve system reliability and mean time to failure. 1. Introduction The hardware systems have become increasingly complex with usage of powerful processors and systems implemented on evermore complex architectures, and they employ massive number of transis- tors [1]. Such hardware systems may face failure in any part, which may reduce or hamper their performance. Hardware failure can hap- pen while the system is running real-life tasks and such failures may occur because of aging of the hardware as well as impact from the surrounding environment (e.g., radiation, temperature, etc.) [2–4]. To handle such failures, self-healing has been envisioned as a solution with a promise to healing or repairing the system without possibly impacting its performance. In self-healing, hardware components are expected to heal any damage to the system from inside without requiring an exter- nal intervention of halting. In nature, for example, mammalian skin is capable of self-healing and recovery from serious injury. Also, many plants are capable of self- healing when they have damage. Most organisms and natural systems Corresponding author. E-mail addresses: kmk8148@louisiana.edu (K. Khalil), oke1206@louisiana.edu (O. Eldash), axk1769@louisiana.edu (A. Kumar), mab0778@louisiana.edu (M. Bayoumi). are able to self-heal which contributes towards their robustness and resilience. Another example, as case of a broken fingernail, the nail grows back by itself gradually, and it removes the damaged area of the nail without any conscious effort from the brain. Thus, the failing components are repaired naturally and in a decentralized way. It is possible that all damages cannot be repaired and thus it is possible that self-healing mechanisms cannot heal all possible types of damages. It is important to understand the difference between self-healing and self-repairing. Self-healing is the ability of maintenance and re- integration of recovered cells or components into the system, whereas the self-repairing mechanism is the replacement of damaged or faulty cells or components by functioning cells in the neighborhood [5–8]. Self-healing and repairing are closely related to each other. Self-healing mechanism is a bottom-up approach, and it recovers a faulty component with a micro scale whereas, self-repairing is a top-down orientation, and it replaces a faulty component with a macro scale. A designer can choose an approach, bottom-up or top-down approach, while design- ing such systems. For this reason of extreme closeness, the term self- https://doi.org/10.1016/j.mejo.2019.104620 Received 9 April 2019; Received in revised form 1 August 2019; Accepted 17 September 2019 Available online 24 September 2019 0026-2692/© 2019 Elsevier Ltd. All rights reserved.