TIMELY: RTT-based Congestion Control for the Datacenter Radhika Mittal * (UC Berkeley), Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi * (Microsoft), Amin Vahdat, Yaogong Wang, David Wetherall, David Zats Google, Inc. ABSTRACT Datacenter transports aim to deliver low latency messag- ing together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feed- back. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accu- racy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust trans- mission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabil- ities. We show using experiments with up to hundreds of ma- chines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by 13X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas. CCS Concepts Networks Transport protocols; Keywords datacenter transport; delay-based congestion control; OS- bypass; RDMA * Work done while at Google Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). SIGCOMM ’15 August 17-21, 2015, London, United Kingdom c 2015 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3542-3/15/08. DOI: http://dx.doi.org/10.1145/2785956.2787510 1. INTRODUCTION Datacenter networks run tightly-coupled computing tasks that must be responsive to users, e.g., thousands of back- end computers may exchange information to serve a user request, and all of the transfers must complete quickly enough to let the complete response to be satisfied within 100 ms [24]. To meet these requirements, datacenter trans- ports must simultaneously deliver high bandwidth (Gbps) and utilization at low latency (msec), even though these aspects of performance are at odds. Consistently low la- tency matters because even a small fraction of late operations can cause a ripple effect that degrades application perfor- mance [21]. As a result, datacenter transports must strictly bound latency and packet loss. Since traditional loss-based transports do not meet these strict requirements, new datacenter transports [10, 18, 30, 35, 37, 47], take advantage of network support to signal the on- set of congestion (e.g., DCTCP [35] and its successors use ECN), introduce flow abstractions to minimize completion latency, cede scheduling to a central controller, and more. However, in this work we take a step back in search of a simpler, immediately deployable design. The crux of our search is the congestion signal. An ideal signal would have several properties. It would be fine- grained and timely to quickly inform senders about the ex- tent of congestion. It would be discriminative enough to work in complex environments with multiple traffic classes. And, it would be easy to deploy. Surprisingly, we find that a well-known signal, properly adapted, can meet all of our goals: delay in the form of RTT measurements. RTT is a fine-grained measure of con- gestion that comes with every acknowledgment. It effec- tively supports multiple traffic classes by providing an in- flated measure for lower-priority transfers that wait behind higher-priority ones. Further, it requires no support from network switches. Delay has been explored in the wide-area Internet since at least TCP Vegas [16], and some modern TCP variants use delay estimates [44, 46]. But this use of delay has not been without problems. Delay-based schemes tend to compete poorly with more aggressive, loss-based schemes, and delay 537