A Note on the Buffer Overlap Among Nodes Performing Random Network Coding in Wireless Ad Hoc Networks Riccardo Masiero , Daniele Munaretto , Michele Rossi , Joerg Widmer and Michele Zorzi DEI, University of Padova, via Gradenigo 6/B – 35131, Padova, Italy DoCoMo Euro-Labs, Landsberger Strasse 312 – 80687 Munich, Germany. Abstract— Network coding is a technique which is particularly suitable for the dissemination of data in distributed ad hoc networks. The definition of a mathematical model that describes the interac- tions among nodes and, in particular, their relationship in terms of buffer subspaces is still an open and challenging problem. The contribution of this paper is an analysis of the relationship between the network topology and the subspace overlap among nodes. This analysis can be used to establish criteria for the design of packet combination policies in diverse networking scenarios. Differently from previous studies, we will explicitly take the overlap among subspaces into account through a framework comprising networks with fixed as well as mobile nodes. I. I NTRODUCTION Efficient data delivery is of very high importance for dis- tributed wireless networking. Random Linear Network Coding (RLNC) [1] is a technique that is particularly suitable for the dissemination of data as it can be used in a distributed and completely unsynchronized manner. Furthermore, network cod- ing algorithms can exploit the broadcast nature of the wireless medium to boost performance. The random mixing of different data flows makes data dissemination robust, which is particularly important in, e.g., mobile ad hoc and sensor networks where node failures are common. Even though previous work dealt with both theoretical and practical schemes for RLNC, it is still unclear how packets should be combined in order to get the highest benefits in terms of throughput, delay, energy efficiency, and data persistence [2] (the amount of information that can be decoded at the data gathering point(s) at any given time). A mathematical model that describes the interactions among nodes performing network coding is a powerful tool from a protocol-design perspective. However, defining such a model is a challenging problem. With RLNC, a network node sends out random vectors from the information space spanned by the packets received thus far. However, it is also possible to create packets from only a subspace of the whole information space available at that node. The dimension of this subspace has an impact on the encoding and decoding complexity as well as the efficiency of the data dissemination process. Our aim is to describe the RLNC transmission dynamics so as to predict, with sufficient accuracy, the evolution of the dissemination process. To do this, we model the data delivery using combinatoric tools that allow us to track the overlap between the sub-spaces spanned by the buffers of different users. The paper is organized as follows. In section II we overview the related work. In section III we illustrate two mathematical models to describe the buffer overlap among nodes performing RLNC and we establish a relation between buffer overlap and the probability that the transmission of a new packet, coded using RLNC, provides innovative information at the receiver. In sections IV and V we present simulation scenarios and results, respectively. Section VI concludes the paper. II. RELATED WORK Several studies have been carried out to understand the dy- namics of RLNC in distributed networks. As an example, an analysis related to our own can be found in [3] and [4], where the authors exploit the properties of the subspaces spanned by the collected information vectors to identify the topological structure of the underlying network graph. In [4] the subspace observation is further used for topology management, to avoid bottlenecks and clustering in network-coded peer-to-peer systems. In [5] it has been shown that coding generally provides benefits in terms of data persistence. Nevertheless, always combining the entire available information when sending a new data packet might leave coded packets undecodable (e.g., after a network failure), thus reducing the data recovery performance. References [2], [6] and [7] address this issue by proposing code degree distributions that maximize data persistence. [2] presents a new class of codes called Growth Codes, that have been generalized in [6]. These papers demonstrate that optimal combination policies exist, even though the analysis is based on the assumption that the information subspaces available at the nodes are uncorrelated. This is true for information exchange at random encounters among nodes or for very high mobility. However, it does not well capture protocol behavior in realistic networks (especially in the presence of moderate mobility or static networks), as shown in [7]. In this last paper, the authors show that networks with a connectivity graph that changes significantly between subsequent transmissions are only representative of a small class of realistic networks. These models do not respect the dynamics of the underlying connectivity structure of many networks and this may seriously impact the performance of the techniques of [2]. The coding rules proposed in [7] are based on heuristics; our aim in this paper is to go beyond this by analyzing buffer dynamics through a mathematical model. III. THEORETICAL MODELS Consider a network with N nodes performing data dissem- ination with network coding over a finite field F q . At node i, the incoming packets containing the information vectors received up to time t form a matrix Y t i . Let Y t i denote the subspace spanned by the rows of Y t i . With RLNC, at transmission time t, the node sends an encoded packet containing a linear combination y t i = mY t i where m is a local encoding vector of random coefficients in F q [8]. (For ease of notation, we will omit index t in the remainder of the paper.) Instead, the node may also send a random vector from a subspace Γ i Y i , where Γ i is the space spanned by a random subset of the rows of Y i . We denote the dimension of this subspace dim(Γ i ) by transmission degree d, i.e., d is the number of rows of the matrix Y i that must be combined to form the outgoing packet. Clearly, the larger d, the higher the probability that this packet is not contained in the information