RecFlow: SDN-based receiver-driven flow scheduling in datacenters Aadil Zia Khan 1 • Ihsan Ayyub Qazi 1,2 Received: 3 November 2017 / Revised: 4 May 2018 / Accepted: 9 March 2019 Ó Springer Science+Business Media, LLC, part of Springer Nature 2019 Abstract Datacenter applications (e.g., web search, recommendation systems, and social networking) are designed to have a high fanout for the purpose of achieving scalable performance. Frequent fabric congestion (e.g., due to incast, imperfect hashing) is a corollary of such a design. This is true even when the network utilization is low. Such fabric congestion exhibits both temporal as well as spatial (intra-rack and inter-rack) variations. There exist two basic design paradigms which are used to address this issue. Current solutions lie somewhere between the two. On one hand we have arbiter based approaches where senders poll a centralized arbiter and collectively obey global scheduling decisions. On the other end of the spectrum, we have self adjusting end point based approaches where senders independently adjust transmission rate based on network congestion. The former incurs greater overhead, compared to the latter which trades off complexity for sub-optimality. Our work seeks a middle ground - optimality of arbiter based approaches with the simplicity of self adjusting end point based approaches. Our key design principle is that since the receiver has complete information regarding the flows destined for it, rather than having a centralized arbiter schedule flows or the senders making inde- pendent scheduling decisions, the receiver can orchestrate the various flows destined for it. Since multiple receivers may be using a bottleneck link, datapath visibility should be used to ensure fair sharing of the bottleneck capacity between receivers with minimum overhead. We propose RecFlow, which is a receiver-based proactive congestion control scheme. RecFlow employs OpenFlow provided path visibility to track changing bottlenecks on the fly. It spaces TCP acknowl- edgements to prevent traffic bursts and ensure that no receiver exceeds its fair share of the bottleneck capacity. The goal is to reduce buffer overflows while maintaining fairness among flows and high link utilization. Using extensive simulation results and real testbed evaluation, we show that compared to the state-of-the-art, RecFlow achieves up to 69 improvement in the inter-rack scenario and 1.59 in the intra-rack scenario while sharing the link capacity fairly between all flows. Keywords Incast Flow scheduling Software defined networks Datacenters 1 Introduction Datacenter (DC) networks have seen wide-scale adoption in recent years. They are a critical constituent of the Internet infrastructure. By enabling cloud computing and large-scale web services (e.g., Internet search, ecommerce, social networking, advertising, and recommendation systems) [2], they support, either directly or indirectly, several businesses and a massive financial ecosystem. The scales involved, have brought challenges neither encountered previously nor considered critical in TCP based communication over wide area networks (WANs). The enormity of scales involved in a DC’s operations can be seen from the fact that Facebook and Google DCs have servers upwards of hundred thousand serving over one billion users and catering to billions of user queries per day [13, 21, 26, 41]. Even though TCP was designed to cater to a wide range of bandwidths, the intra-datacenter environ- ment of very high bandwidths, 10 Gbps and 40 Gbps links being the norm, together with microsecond round trip times (RTTs), in the range of 80–300 ls, is a relatively new phenomena. One of the flexibility afforded by DC net- works, unlike WANs, is that since they lie under a single & Aadil Zia Khan aadilzia.khan@lums.edu.pk Ihsan Ayyub Qazi ihsan.qazi@lums.edu.pk 1 Lahore University of Management Sciences, Lahore, Pakistan 2 UC Berkeley, Berkeley, USA 123 Cluster Computing https://doi.org/10.1007/s10586-019-02922-4