This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON RELIABILITY 1 CMV: Clustered Majority Voting Reliability-Aware Task Scheduling for Multicore Real-Time Systems Alireza Namazi, Saeed Safari, and Siamak Mohammadi Abstract—This paper proposes a novel reliability-aware hard real-time task scheduling method for multicore systems along with a quantitative reliability model. The proposed method uses a heuristic clustered replication to maintain the desired reliability threshold with both minimum replication overhead and latency increase. It also minimizes intercore communication overhead of tasks. Both single and multiple soft errors are considered in this method. Simulation results show that the efficiency of our proposed approach improves with larger network-on-chip sizes, higher re- liability thresholds, and higher number of tolerating errors. The proposed method achieves near optimal replica overhead (up to 7.3% higher than optimal replica overhead) with up to 2500% time complexity improvement compared to exhaustive exploration. Ex- perimental results also show that the feasibility of the proposed method is higher than the conventional replication method up to 9.3%. All experiments are performed on both synthetic random task graphs and PARSEC real application benchmarks. Obtained task mapping solutions with communication volume reduction and near optimal replica overhead impose negligible latency increase (up to 6.3%) in comparison with the space exploration approach. Index Terms—Network-on-chip (NoC), replication, reliability, real time, task scheduling. I. INTRODUCTION R ECENT increase in design complexity combined with broad application domains of embedded systems, has highlighted the importance of multicore platforms for digital system designers [1]. Multicore platforms benefit from intrin- sic characteristics such as concurrency, which are desirable for system designers [2]. The multicore platform has evolved over recent years. First generations of multicore platforms used bus topology as their interconnection network [3]. They had both bandwidth and scalability issues. Modern platforms with higher number of cores use network-on-chip (NoC)-based architectures as their interconnection networks to overcome above-mentioned obstacles [1], [4]. Considering wide range of embedded appli- cations, designers must consider multiple targets in their system designs. Both real time and reliability are among major design targets in today’s digital systems [5], [6]. Manuscript received November 20, 2017; revised March 12, 2018 and June 19, 2018; accepted September 1, 2018. Associate Editor: Y. Deng. (Correspond- ing author: Siamak Mohammadi.) The authors are with the School of Electrical and Computer Engineer- ing, College of Engineering, University of Tehran, Tehran 1417466191, Iran (e-mail:, a.namazi@ut.ac.ir; saeed@ut.ac.ir; smohamadi@ut.ac.ir). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TR.2018.2869786 Reliability is one of the most important design goals that plays a key role in many embedded applications, especially in safety-critical applications [7] such as automotive applications. Systems such as automotive electronics, avionics, telecommu- nications, and space systems fall under a category of digital embedded systems which must satisfy hard real-time constraint [8]. These types of systems tend to achieve both temporal and logical correctness of their computations. The susceptibility of digital systems to errors has increased due to continuous transistor scaling in digital systems. It notably increases the importance of reliability as a goal of system design- ers. Considering the application in which embedded systems are used, different levels of criticality [9], [10] are needed. There exist many international safety levels such as IEC 61508 [11] defined by the international electro-technical commission and ISO 26262 [9] specialized for the automotive electronics. Dif- ferent international safety levels show that digital systems must encompass recommended levels of reliability to be deemed as reliable systems. Task mapping, which is the process which specifies the core on which the task must be executed [12], is among the most crucial issues in multicore platforms affecting the efficiency of digital systems [13]. It also resides in the category of NP-hard problems [8]. Designers, in multicore era, try to meet their de- fined needs and criteria by specifying a suitable task mapping strategy [14]. Many digital systems must be designed in a way to fulfill multiple characteristics of the applications simultane- ously, such as reliability and real time. This paper proposes a novel reliability-aware task scheduling on NoC-based platforms for hard real-time applications. It uses modified clustered replication with majority voting to maintain desired levels of reliability. The main contributions of this paper are as follows. 1) Probabilistic modeling of task mapping to maintain de- sired levels of reliability based on a predefined relia- bility threshold specified for the system. Both single and multiple soft errors are considered during reliability evaluation. 2) Proposing a multistep heuristic algorithm which drasti- cally reduces the time of finding a feasible task mapping solution for hard real-time applications. 3) Minimum replica number overhead to maintain reliability threshold. 4) Using clustered replication method to both minimize in- tercore communications and the execution latency of task graphs (TGs). 0018-9529 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.