FAULT TOLERANCE OF MULTIHOP WIRELESS NETWORKS N. Eva Wu † , Xiaohua Li † , and Timothy Busch ‡ † ECE Dept., Binghamton Univ., Binghamton, NY 13902, USA ‡ AFRL/IFSB, 525 Brooks Road, Rome, NY 13441, USA Abstract: This paper analyzes fault-tolerance over the entire design life of a class of multiple-hop wireless networks subject to both node failure and random channel fadings. It also examines the beneﬁt and cost of feedback in network operations. A node lifetime distribution is modeled with an increasing failure rate, where the node power consumption level enters the parameters of the distribution. A method for assessing both link and network reliabilities projected at the network’s design life is developed. The link reliability is then used to allocate active nodes to clusters using dynamic programming for maximizing the network’s fault-tolerance, and to establish a re-transmission control policy that minimizes an expected cost involving power, bandwidth expenditures, and packet loss. Keywords: reliability, fault-tolerance, wireless networks, dynamic programming, Markov decision problem. 1. INTRODUCTION The class of wireless networks under consideration is the class of multiple-hop, distributed networks consisting of a large number of nodes. Each node has a limited energy supply that cannot be re- plenished, and is capable of packet transmission, reception, and processing that involves detection, fusion, coding and decoding. Our goal is to maxi- mize the network reliability at its design life T D 1 . Our main challenge is to develop a power covariate network reliability model 2 . As a result, the net- work reliability becomes the overarching measure that encompasses aspects of symbol error rate, energy eﬃciency, bandwidth eﬃciency, the eﬀect of clustering, and the eﬀect of feedback. Many algorithms have been developed for the computation of node-pair reliability of networks, ? The ﬁrst author acknowledges the support by the US Air Force Research Laboratory (F30602-020-C-0225), and by NASA (NCC-1-02009). 1 A design life is deﬁned as the maximum time by which a prescribed network reliability R D is maintained, i.e.,F net (t)| t=T D =1 − R D . where F net (t) is the cumula- tive distribution function of the network time to failure. 2 The network reliability is given by R net (t)=1−F net (t), which is deﬁned as the probability that the network per- forms successfully its required function over a period of t time units under the stated operating conditions. which is the probability that at least one route exists between a source node and a terminal node (Torrieri, 1994). Unlike any other networks, how- ever, each route in our network itself forms a sub- network with an additional structure bound by the cooperative transmission scheme used. There- fore, we conﬁne ourselves to the sub-network of a K-cluster route through which packets hop from cluster 1 to cluster K. The restriction to the single-route problem is entirely due to our in- tention to capitalize on some new physical layer transmission schemes (Li and Wu 2003; Li 2003, 2004). Our interest is not in devising routing pro- tocols (Ordonez et al., 2004) that enhance the net- work connectivity evaluated using the knowledge of the spacial distribution of the wireless nodes (Xue and Kumar, 2004), or prolong network life- time assessed using the deterministic knowledge of energy expenditure at each node (Bhardwaj et al., 2002). Instead, we are seeking to understand and to optimize the temporal evolution of network reliability and to utilize this information in the network operation with little supervising activity. Existing schemes for enhancing the network fault- tolerance all carry signiﬁcant overhead in terms of energy consumption. Examples of such schemes include multiple-path routing (Ganesan et al., 2002), packet replication (De et al., 2003), or