Microelectronics Journal 93 (2019) 104620
Contents lists available at ScienceDirect
Microelectronics Journal
journal homepage: www.elsevier.com/locate/mejo
Self-healing hardware systems: A review
Kasem Khalil
a , ∗
, Omar Eldash
a
, Ashok Kumar
a
, Magdy Bayoumi
a, b
a
The Center for Advanced Computer Studies, University of Louisiana at Lafayette, LA, USA
b
Department of Electrical and Computer Engineering, University of Louisiana at Lafayette, LA, USA
ARTICLE INFO
Index terms:
Self-healing
Embryonic hardware
Evolvable hardware
Fault tolerance
DMR
TMR
Intelligent hardware system
ABSTRACT
Self-healing is increasingly becoming a promising approach to designing reliable digital systems, and it refers to
the ability of a system to detect faults or failures and fix them through healing or repairing. Digital systems with
architecture for self-healing are expected to compensate faults. However, there are a few research challenges
that need to be overcome before self-healing becomes a mainstream approach. For example, current self-healing
techniques face challenges such as scalability, reliability, area overhead, and mapping. This paper explains the
self-healing concept and investigates the self-healing approaches related to digital design in the literature. It gives
a general overview of the topic and explains levels of abstraction at which self-healing can be used: hardware
level, application level, and system level. The paper presents multiple related works at each level of abstraction.
In the faulted phase, different types of faults and fault detection methods are described. For the evaluation
of a self-healing technique, this paper presents the parameters which can be used to evaluate a self-healing
method. These parameters are redundancy rate, the maximum ratio of repair, self-healing time consumption,
reliability, and area overhead. The paper also presents a comparison between previous works of self-healing
in terms of evaluation techniques. Implementations using VHDL and ISE Xilinx Vertex-5 for self-healing on
Embryonic Hardware (EmHW) and Network-on-Chip (NoC) are also presented. The simulation results show the
contribution of self-healing to improve system reliability and mean time to failure.
1. Introduction
The hardware systems have become increasingly complex with
usage of powerful processors and systems implemented on evermore
complex architectures, and they employ massive number of transis-
tors [1]. Such hardware systems may face failure in any part, which
may reduce or hamper their performance. Hardware failure can hap-
pen while the system is running real-life tasks and such failures may
occur because of aging of the hardware as well as impact from the
surrounding environment (e.g., radiation, temperature, etc.) [2–4]. To
handle such failures, self-healing has been envisioned as a solution with
a promise to healing or repairing the system without possibly impacting
its performance. In self-healing, hardware components are expected to
heal any damage to the system from inside without requiring an exter-
nal intervention of halting.
In nature, for example, mammalian skin is capable of self-healing
and recovery from serious injury. Also, many plants are capable of self-
healing when they have damage. Most organisms and natural systems
∗
Corresponding author.
E-mail addresses: kmk8148@louisiana.edu (K. Khalil), oke1206@louisiana.edu (O. Eldash), axk1769@louisiana.edu (A. Kumar), mab0778@louisiana.edu
(M. Bayoumi).
are able to self-heal which contributes towards their robustness and
resilience. Another example, as case of a broken fingernail, the nail
grows back by itself gradually, and it removes the damaged area of
the nail without any conscious effort from the brain. Thus, the failing
components are repaired naturally and in a decentralized way. It is
possible that all damages cannot be repaired and thus it is possible that
self-healing mechanisms cannot heal all possible types of damages.
It is important to understand the difference between self-healing
and self-repairing. Self-healing is the ability of maintenance and re-
integration of recovered cells or components into the system, whereas
the self-repairing mechanism is the replacement of damaged or faulty
cells or components by functioning cells in the neighborhood [5–8].
Self-healing and repairing are closely related to each other. Self-healing
mechanism is a bottom-up approach, and it recovers a faulty component
with a micro scale whereas, self-repairing is a top-down orientation,
and it replaces a faulty component with a macro scale. A designer can
choose an approach, bottom-up or top-down approach, while design-
ing such systems. For this reason of extreme closeness, the term self-
https://doi.org/10.1016/j.mejo.2019.104620
Received 9 April 2019; Received in revised form 1 August 2019; Accepted 17 September 2019
Available online 24 September 2019
0026-2692/© 2019 Elsevier Ltd. All rights reserved.