On the Performance and Robustness of Managing Reliable Transport Connections GONCA GURSUN I BRAHIM MATTA KARIM MATTAR {goncag, matta, kmattar}@cs.bu.edu Computer Science Boston University, MA Technical Report BUCS-TR-2009-014 April 17, 2009 Abstract— We revisit the problem of connection management for reliable transport. At one extreme, a pure soft-state (SS) approach (as in Delta-t [9]) safely removes the state of a connection at the sender and receiver once the state timers expire without the need for explicit removal messages. And new connections are established without an explicit handshaking phase. On the other hand, a hybrid hard-state/soft-state (HS+SS) approach (as in TCP) uses both explicit handshaking as well as timer-based management of the connection’s state. In this paper, we consider the worst-case scenario of reliable single-message communication, and develop a common analytical model that can be instantiated to capture either the SS approach or the HS+SS approach. We compare the two approaches in terms of goodput, message and state overhead. We also use simulations to compare against other approaches, and evaluate them in terms of correctness (with respect to data loss and duplication) and robustness to bad network conditions (high message loss rate and variable channel delays). Our results show that the SS approach is more robust, and has lower message overhead. On the other hand, SS requires more memory to keep connection states, which reduces goodput. Given memories are getting bigger and cheaper, SS presents the best choice over bandwidth-constrained, error-prone networks. I. I NTRODUCTION Reliable end-to-end transport communication has been studied since the 70’s and various mechanisms have made their way into TCP [6], the reliable transport protocol widely used on the Internet today. Many of these mechanisms provided incremental patches to solve the fundamental problems of data loss and duplication. Richard Watson in the 80’s [9] provided a fundamental theory of reliable transport, whereby connection management requires only timers bounded by a small factor of the Maximum Packet Lifetime (MPL). Based on this theory, Watson et al. developed the Delta-t protocol [2], which we classify as a pure soft-state (SS) protocol – i.e., the state of a connection at the sender and receiver can be safely removed once the connection-state timers expire without the need for explicit removal messages. And new connections are established without an explicit handshaking phase. On the other hand, TCP uses both explicit handshaking as well as timer-based management of the connection’s state. Thus, TCP’s approach can be viewed as a hybrid hard-state/soft-state (HS+SS) protocol. Given the recent interest in clean-slate network architec- tures, it is imcumbent on us to question the design of every aspect of the current Internet architecture. In this paper, we question a specific design aspect of TCP, that of connection management: Despite Watson’s theory, why does a popular transport protocol, like TCP, manage its connections using both a state timer at the sender as well as explicit connection-management messages for opening and closing connections? Though over a decade ago, we have seen many pioneering work in the area of reliable transport—see [8], [1], [2], [9], [7] for examples—this body of work has focused on the correctness aspects of reliable delivery but not performance. From the correctness point of view, Watson’s theory states that one can achieve reliability using an SS approach, as long as one can bound exactly three timers for: (1) the maximum time that a sender expends retransmitting a data packet (G), (2) the maximum time that an acknowledgment is delayed by the receiver (UAT), and (3) the maximum time that a packet is allowed to live inside the network (MPL). Watson argues that all these times are naturally bounded in actual implementations. And since G and UAT are typically much smaller than MPL, connection-state timers (at both sender and receiver) can be bounded by a small factor of MPL. Note that TCP itself, despite its use of explicit connection- management messages, uses a connection-state timer (at the sender). And TCP has to use such a state timer in order to operate correctly 1 . Thus, from a correctness point of view, there is no way around the need for state timers, only that TCP relies on less of them. Our Contribution: From a performance point of view, to the best of our knowledge, there is no work that compares the hybrid HS+SS approach of TCP against the arguably simpler SS approach of Delta-t. In this paper, we provide a first performance comparison study. We consider the worst-case scenario of re- liable single-message communication, and develop a common analytical model that can be instantiated to capture either the SS approach or the HS+SS (five-packet exchange) approach. This analytical model specializes the general model of Ji et al. [3] for signaling protocols to connection management for reliable transport. We compare the two approaches in terms of goodput, message and state overhead. We also use simulations 1 Obviously, this full-proof correctness assumes that the MPL guarantee from the underlying network is not violated. Otherwise, one can only show correctness with high probability.