Manuscript submitted to IEEE/ACM Transaction on Networking 1 Abstract—One of the headache problems in high performance computing area is the lack of a transport protocol to transfer bulk data fast over computational grids. TCP, the de facto transport protocol of the Internet, fails to utilize the abundant optical bandwidth efficiently. It also has unfairness problem of RTT bias that can lead to performance drop for distributed applications. This paper presents an end-to-end approach to address these efficiency and fairness problems, especially over high bandwidth-delay product networks. This congestion control algorithm combines rate based and window based control mechanisms. It deploys selective acknowledgement, constant synchronization time, bandwidth measurement, as well as packet delay monitoring to support the rate and window control. Both the theoretical analysis and simulation/implementation results have shown the algorithm satisfies the objectives of high performance, intra-protocol fairness, TCP-friendliness, and stability. Index Terms—Congestion control, end-to-end approach, high bandwidth-delay product networks, transport protocol. I. INTRODUCTION HOTONIC technology has pushed the network bandwidth to 10Gbps and enabled a wide range of new, powerful applications, including high resolution streaming media, remote access to scientific instruments, and specialized virtual-reality such as tele-immersion. The pattern of data flow over these high speed wide area networks, or high bandwidth-delay product (BDP) networks, is quite different from that on the commercial Internet. The de facto standard transport protocol of TCP over the Internet utilizes the backbone bandwidth with multiplexed large amounts of concurrent flows, most of which are short life burst traffic [35]. However, in computational grids, it is often the case that a small number of bulk sources share the abundant optical bandwidth [18]. The window based AIMD (additive increase multiplicative decrease) control algorithm of TCP suffers more from random loss as the BDP increases to higher [19, 22]. Its congestion avoidance mechanism takes too long a time to fully probe the Manuscript received Aug. 1, 2003. This work was supported by the National Science Foundation under grant ANI-9977868. The authors are with the Laboratory for Advanced Computing, University of Illinois at Chicago, Chicago, IL 60607, USA (phone: 312-996-0305; fax: 312-355-0373; e-mail: gu@lac.uic.edu, grossman@uic.edu). Robert L. Grossman is also with the Two Cultures Group, Chicago, IL 60607, USA. bandwidth. The continuous packet loss is another disaster to TCP in high BDP networks, which can decrease the window size to a very small value and then takes substantial time to recover [19]. These drawbacks have been convinced in real network experiments [22]. Moreover, TCP also has fairness problem of RTT (round trip time) bias. When TCP flows with different RTTs share the same bottleneck link, flows with shorter RTTs will occupy more bandwidth than flows with longer RTTs [6, 19, 21, 22]. This problem is particular critical in many high performance applications. For example, in streaming join, the performance is limited by the slowest data stream [29]. Network researchers have put out series of solutions, including enhancement to TCP [2, 4, 5, 17, 26, 27] and new transport layer protocols [6, 24, 25]. Due to the time and financial cost of standardization and system upgrade (especially for those open looped control mechanisms), most of them have not been deployed widely yet and are not likely in the near future. Although some applications have used parallel TCP [28] for high performance data transfer, it does not solve all the TCP problems and is too aggressive to regular TCP flows [28]. A new end-to-end approach is necessary to be invented to support those high performance data intensive distributed applications. An end-to-end congestion control algorithm can be implemented at application layer (e.g., using UDP as lower layer transport) such that it can be deployed without any modification to the operating system and network infrastructure. Moreover, the approach can be used in peer-to-peer or overlay networks to serve as high performance data transfer service layer. This paper presents such a congestion control algorithm that has been implemented in a high performance data transport protocol named UDT, or UDP based Data Transfer protocol, which is a reliable application layer protocol over UDP. In the rest of the paper we refer the congestion control algorithm to be described as the UDT algorithm. The algorithm uses rate based control for inter-packet time and a window based control for number of unacknowledged packets. The rate control algorithm is AIMD, where the increase parameter is related to estimated available bandwidth and the decrease factor is constant and fixed. UDT can utilize almost full bandwidth even at 1 Gbps link capacity and 100ms RTT environment (between Chicago and Amsterdam []). Meanwhile, it is still fair to concurrent UDT and TCP flows, if there is any. End-to-End Congestion Control for High Performance Data Transfer Yunhong Gu, Student Member, IEEE and Robert L. Grossman, Member, IEEE P