ORIGINAL ARTICLE Parallel architecture coding: link failure–recovery mechanism (PAC: LF–RM) Nitin Rakesh • Vipin Tyagi Received: 29 September 2011 / Revised: 10 July 2012 Ó The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden 2012 Abstract Parallel network coding is a new communica- tion paradigm that takes advantage of the broadcast using characteristics of network coding for parallel architectures. Network coding has recently developed as an innovative paradigm for optimization problems for high-scale com- munication between several nodes of these architectures. In previous work, a decentralized approach by optimizing parallel communication with the use of network coding has been proposed. In the present paper, the chance of com- munication failure is evaluated and an efficient solution (PAC: LF–RM) for such situations is proposed. It is shown that since communication failure may occur and is not avoidable so to overcome such failure and the data loss, buffering at an alternate degree of network nodes will reduce this stipulation. This paper explores the combina- tion of network coding and node buffering for handling communication failure situation in the 2D Mesh network and results are presented for various cases of communi- cation failures. Keywords Network coding Communication failure Parallel architectures Buffering 1 Introduction Energy efficient broadcasting in parallel networks becomes promising with opportunities of network coding (Ahlswede et al. 2000; Li et al. 2003; Yang and Yang 2009; Ning 2009; Chou and Wu 2007; Yeung et al. 2006; Lin et al. 2008; Chou et al. 2003; Widmer et al. 2005; Fragouli et al. 2006; Li and Li 2004; Kramer and Savari 2004; Agarwal and Charikar 2004; Sanders et al. 2003; Koetter and Me- dard 2003). In (Rakesh and Tyagi 2011a) it is shown that communication in parallel architectures i.e., parallel com- munication; can be improved by using the network coding for data communication. Further in (Rakesh and Tyagi 2011a), it is shown that the storage requirements at each node can be reduced by employing the network coding approach and thereby solves the problem of huge data size involved in parallel communication. The issues of efficient broadcasting in these architectures are proposed in (Rakesh and Tyagi 2011b). The energy required to perform the broadcast on such network, is higher, as the computation and communication time at each node is increasing exponentially. These prob- lems are resolved by providing energy-efficient broadcasting using the network coding approach (Rakesh and Nitin 2009, 2011; Rakesh and Tyagi 2011b, c). Now, the issues of effi- cient communication, faulty nodes and data size contributes to another important problem of data loss (when the com- munication fails). There are several reasons for the failure of communication, but it is imperative to develop a mechanism besides such failing condition, to provide robust and secure communication in parallel architectures. This paper suggests parallel architecture coding: link failure–recovery mecha- nism (PAC: LF–RM) approach for data recovery caused due to any communication failure in the network. N. Rakesh (&) Department of Computer Science & Engineering, Jaypee University of Information Technology, Waknaghat, Solan 173215, Himachal Pradesh, India e-mail: nitin.rakesh@gmail.com V. Tyagi Department of Computer Science & Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India e-mail: dr.vipin.tyagi@gmail.com 123 Int J Syst Assur Eng Manag DOI 10.1007/s13198-012-0119-4