Modeling Pairwise Key Establishment for Random Key Predistribution in Large-scale Sensor Networks Dijiang Huang, Member, IEEE, Manish Mehta, Member, IEEE, Appie van de Liefvoort, Member, IEEE, Deep Medhi, Senior Member, IEEE, Abstract— Sensor networks are composed of a large number of low power sensor devices. For secure communication among sensors, secret keys are required to be established between them. Considering the storage limitations and the lack of post- deployment configuration information of sensors, Random Key Predistribution schemes have been proposed. Due to limited number of keys, sensors can only share keys with a subset of the neighboring sensors. Sensors then use these neighbors to establish pairwise keys with the remaining neighbors. In order to study the communication overhead incurred due to pairwise key establishment, we derive probability models to design and analyze pairwise key establishment schemes for large-scale sensor networks. Our model applies the binomial distribution and a modified binomial distribution and analyzes the key path length in a hop-by-hop fashion. We also validate our models through a systematic validation procedure. We then show the robustness of our results and illustrate how our models can be used for addressing sensor network design problems. I. I NTRODUCTION Large-scale sensor networks are composed of a large num- ber of low-powered sensor devices. According to [1], the number of sensor nodes deployed to study a phenomenon may be on the order of hundreds or thousands; depending on the application, the number may reach an extreme value of mil- lions. Typically, these networks are installed to collect sensed data from sensors deployed in a large area. Within a network, sensors communicate among themselves to exchange data and routing information. Because of the wireless nature of the communication among sensors, these networks are vulnerable to various active and passive attacks on the communication protocols and devices. This demands secure communication among sensors. Due to inherent storage constraints, it is infeasible for a sensor device to store a shared key value for every other sensor in the system. Moreover, because of the lack of post- deployment geographic configuration information of sensors, keys cannot be selectively stored in sensor devices. Although a na¨ıve solution would be to use a common key between every pair of sensors to overcome the storage constraints, it offers weak security. Manuscript received March 2005; revised February 2006, April 2006. D. Huang is with the Department of Computer Science & Engineering, Arizona State University, Tempe, AZ, USA (e-mail: dijiang@asu.edu). M. Mehta is with Tumbleweed Communications. (email: manish.mehta@tumbleweed.com). A. van de Liefvoort and D. Medhi are with the Department of Computer Science and Electrical Engineering, University of Missouri–Kansas City, USA (e-mail: appie@umkc.edu, dmedhi@umkc.edu). Random Key Predistribution (RKP) schemes ([10], [6], [15] and [8]) have been proposed to provide flexibility for the designers of sensor networks to tailor the network deployment to the available storage and the security requirements. The RKP schemes propose to randomly select a small number of keys from a fixed key pool for each sensor. Sensors then share keys with each other with a probability proportional to the number of keys stored in each sensor. Since the RKP schemes necessitate only limited number of keys to be preinstalled in sensors, a sensor may not share keys with all of its neighbor nodes. In this case, a Pairwise Key Establishment (PKE) scheme is required to set up shared keys with required fraction of neighbor nodes. The PKE schemes require sensors to set up pairwise keys via the nodes that share keys with either or both the sensors. This PKE phase involves communication overhead for finding the shortest path to a neighbor node and for setting up the pairwise key through that path. The lesser the number of keys preinstalled in each sensor, the lower the probability that a sensor shares a key with a given neighbor node. Consequently, the sensor requires more overhead in the PKE phase with the remaining neighbor nodes. Studies in [5] show that the energy consumption due to communication in sensors is several orders higher than that due to computation overhead. The constraints such as scarce battery power and limited storage necessitate a reference model to study the tradeoff between storage and communication overhead involved during the PKE phase in RKP schemes. It may be noted that the memory limitation of sensors restricts the number of keys that can be preinstalled in each sensor to a small number. For example, the capabilities of sensor nodes for large-scale sensor networks can be as limited as those of Smart Dust sensors [12], [11] that have only 8Kb of program and 512 bytes for data memory. Moreover, studies in [6] and [8] show that a small key pool size increases security vulnerabilities. Thus, for large-scale sensor networks, a small number of keys preinstalled in each sensor and a large key pool size result in a small value of probability (p 1 ) that two sensors share keys (see (1) in Section II-B.1). Our studies show that the smaller the value of p 1 , the higher the number of hops required to set up pairwise keys (A detailed analysis is given in Section V). Analyses presented in [6] and [8] provide communication overhead in the PKE phase for up to 3 hops. Due to the restrictions mentioned above, a general mathematical model to study the communication overhead for the PKE phase is required. In this paper, we propose a probability model to analyze ©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE.