Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1071-1086.doi: 10.1007/s11390-021-1243-x

Special Issue: Computer Architecture and Systems; Computer Networks and Distributed Computing

• Special Section of APPT 2021 (Part 1) • Previous Articles     Next Articles

Harmonia: Explicit Congestion Notification and Credit-Reservation Transport Converged Congestion Control in Datacenters

Ding-Huang Hu, De-Zun Dong*, Yang Bai, Shan Huang, Ze-Jia Zhou, Zi-Hao Wei, and Xiang-Ke Liao, Fellow, CCF        

  1. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2021-01-01 Revised:2021-08-25 Online:2021-09-30 Published:2021-09-30
  • About author:Ding-Huang Hu received his B.S. degree in computer science and technology from the Beijing Institude of Technology (BIT), Beijing, in 2019. He is a postgraduate student in the College of Computer Science and Technology, National University of Defense Technology (NUDT), Changsha. His research interests include high-performance network and architecture.
  • Supported by:
    The work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB0204300, the National Postdoctoral Program for Innovative Talents under Grant No. BX20190091, and Excellent Youth Foundation of Hunan Province (De-Zun Dong).

Bursty traffic and thousands of concurrent flows incur inevitable network congestion in datacenter networks (DCNs) and then affect the overall performance. Various transport protocols are developed to mitigate the network congestion, including reactive and proactive protocols. Reactive schemes use different congestion signals, such as explicit congestion notification (ECN) and round trip time (RTT), to handle the network congestion after congestion arises. However, with the growth of scale and link speed in datacenters, reactive schemes encounter a significant problem of slow responding to congestion. On the contrary, proactive protocols (e.g., credit-reservation protocols) are designed to avoid congestion before it occurs, and they have the advantages of zero data loss, fast convergence and low buffer occupancy. But credit-reservation protocols have not been widely deployed in current DCNs (e.g., Microsoft, Amazon), which mainly deploy ECN-based protocols, such as data center transport control protocol (DCTCP) and data center quantized congestion notification (DCQCN). And in an actual deployment scenario, it is hard to guarantee one protocol to be deployed in every server at one time. When credit-reservation protocol is deployed to DCNs step by step, the network will be converted to multi-protocol state and will face the following fundamental challenges:1) unfairness, 2) high buffer occupancy, and 3) heavy tail latency. Therefore, we propose Harmonia, aiming for converging ECN-based and credit-reservation protocols to fairness with minimal modification. To the best of our knowledge, Harmonia is the first to address the trouble of harmonizing proactive and reactive congestion control. Targeting the common ECN-based protocols-DCTCP and DCQCN, Harmonia leverages forward ECN and RTT to deliver real-time congestion information and redefines feedback control. After the evaluation, the results show that Harmonia effectively solves the unfair link allocation, eliminating the timeouts and addressing the buffer overflow.

Key words: datacenter; credit-reservation protocol; ECN-based (explicit congestion notification based) protocol; multiprotocol converging;

[1] Jose L, Lan L, Alizadeh M et al. High speed networks need proactive congestion control. In Proc. the 14th ACM Workshop on Hot Topics in Networks, November 2015, Article No. 14. DOI:10.1145/2834050.2834096.
[2] Cho I, Jang K, Han D. Credit-scheduled delay-bounded congestion control for datacenters. In Proc. the ACM Special Interest Group on Data Communication, August 2017, pp.239-252. DOI:10.1145/3098822.3098840.
[3] Kabbani A, Alizadeh M, Yasuda M et al. AF-QCN:Approximate fairness with quantized congestion notification for multi-tenanted data centers. In Proc. the 18th IEEE Symposium on High Performance Interconnects, August 2010, pp.58-65. DOI:10.1109/HOTI.2010.26.
[4] Gusat M, Crisan D, Minkenberg C et al. R3C2:Reactive route and rate control for CEE. In Proc. the 18th IEEE Symposium on High Performance Interconnects, August 2010, pp.50-57. DOI:10.1109/HOTI.2010.17.
[5] Alizadeh M, Greenberg Albert, Maltz D et al. Data center TCP (DCTCP). In Proc. the 2010 ACM SIGCOMM Conference, August 30-September 3, 2010, pp.63-74. DOI:10.1145/1851182.1851192.
[6] Wu H, Feng Z, Guo C et al. ICTCP:Incast congestion control for TCP in data-center networks. In Proc. the 2010 ACM Conference on Emerging Networking Experiments and Technology, November 30-December 3, 2010, Article No. 13. DOI:10.1145/1921168.1921186.
[7] Zhu Y, Eran H, Firestone D et al. Congestion control for large-scale RDMA deployments. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, August 2015, pp.523-536. DOI:10.1145/2785956.2787484.
[8] Alizadeh M, Kabbani A, Edsall T et al. Less is more:Trading a little bandwith for ultra-low latency in the data center. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.19-33.
[9] Mittal R, Lam V T, Dukkipati N et al. Timely:RTTbased congestion control for the datacenter. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, August 2015, pp.537-550. DOI:10.1145/2785956.2787510.
[10] Perry J, Ousterhout A, Balakrishnan H et al. Fastpass:A centralized "zero-queue" datacenter network. In Proc. the 2014 ACM SIGCOMM Conference, August 2014, pp.307-318. DOI:10.1145/2619239.2626309.
[11] Lee C, Park C, Jang K et al. Accurate latency-based congestion feedback for datacenters. IEEE/ACM Trans. Networking, 2016, 25(1):403-415. DOI:10.1109/TNET.2016.2587286.
[12] Perry J, Balakrishnan H, Shah D et al. Flowtune:Flowlet control for datacenter networks. In Proc. the 14th USENIX Conference on Networked Systems Design and Implementation, March 2017, pp.421-435.
[13] Lee D, Golestani S J, Lee D. Prevention of deadlocks and livelocks in lossless, backpressured packet networks. IEEE/ACM Trans. Networking, 2003, 11(6):923-934. DOI:10.1109/TNET.2003.820434.
[14] Mittal R, Shpiner A, Panda A et al. Revisiting network support for RDMA. In Proc. the 2018 Conference of the ACM Special Interest Group on Data Communication, August 2018, pp.313-326. DOI:10.1145/3230543.3230557.
[15] Zhang Y, Jiang J, Xu K et al. BDS:A centralized nearoptimal overlay network for inter-datacenter data replication. In Proc. the 13th EuroSys Conference, April 2018, Article No. 10. DOI:10.1145/3190508.3190519.
[16] Kung H, Blackwell T, Chapman A. Credit-based flow control for ATM networks:Credit update protocol, adaptive credit allocation, and statistical multiplexing. In Proc. the Conference on Communications Architectures, Protocols and Applications, August 31-September 2, 1994, pp.101-114. DOI:10.1145/190314.190324.
[17] Yang X, Wetherall D, Anderson T. A DoS-limiting network architecture. In Proc. the 2005 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, August 2005, pp.241-252. DOI:10.1145/1080091.1080120.
[18] Farrington N, Andreyev A. Facebook's data center network architecture. In Proc. the 2013 Optical Interconnects Conference, May 2013, pp.49-50. DOI:10.1109/OIC.2013.6552917.
[19] Farrington N, Rubow E, Vahdat A. Data center switch architecture in the age of merchant silicon. In Proc. the 17th IEEE Symposium on High Performance Interconnects, August 2009, pp.93-102. DOI:10.1109/HOTI.2009.11.
[20] Wei Z, Dong D, Huang S et al. EC4:ECN and credit-reservation converged congestion control. In Proc. the 25th International Conference on Parallel and Distributed Systems, December 2019. DOI:10.1109/ICPADS47876.2019.00039.
[21] Zhang J, Bai W, Chen K. Enabling ECN for datacenter networks with RTT variations. In Proc. the 15th International Conference on Emerging Networking Experiments and Technologies, December 2019, pp.233-245. DOI:10.1145/3359989.3365426.
[22] Chen L, Lingys J, Chen K et al. AuTO:Scaling deep reinforcement learning of the datacenter-scale automatic traffic optimization. In Proc. the 2018 Conference of the ACM Special Interest Group on Data Communication, August 2018, pp.191-205. DOI:10.1145/3230543.3230551.
[23] Alizadeh M, Yang S, Sharif M et al. pFabric:Minimal near-optimal datacenter transport. In Proc. the 2013 ACM SIGCOMM Conference, August 2013, pp.435-446. DOI:10.1145/2486001.2486031.
[24] Wei Z, Dong D, Huang S et al. Measuring the coexistence competitiveness of ECN-or RTT-based ExpressPass and TCP in data centers. In Proc. the 2019 IEEE Int. Conf. Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, December 2019, pp.286-293. DOI:10.1109/ISPA-BDCloudSustainCom-SocialCom48970.2019.00050.
[25] Floyd S, Jacobson V. Random early detection gateways for congestion avoidance. IEEE/ACM Trans. Networking, 1993, 1(4):397-413. DOI:10.1109/90.251892.
[26] Barkmo L S, Peterson L L. TCP Vegas:End to end congestion avoidance on a global Internet. IEEE Journal on Selected Areas in Communications, 1995, 13(8):1465-1480. DOI:10.1109/49.464716.
[27] Jin C, Wei D X, Low S. FAST TCP:Motivation, architecture, algorithms, performance. In Proc. the 2004 IEEE INFOCOM, March 2004, pp.2490-2501. DOI:10.1109/INFCOM.2004.1354670.
[28] Venkataramani A, Kokku R, Dahlin M. TCP Nice:A mechanism for background transfers. In Proc. the 5th Symposium on Operating Systems Design and Implementation, December 2002, pp.329-343. DOI:10.1145/1060289.1060320.
[29] Kuzmanovic A, Knightly E. TCP-LP:A distributed algorithm for low priority data transfer. In Proc. the 2003 IEEE INFOCOM, March 30-April 3, 2003, pp.1691-1701. DOI:10.1109/INFCOM.2003.1209192.
[30] Jiang N, Becker D, Michelogiannakis G et al. Network congestion avoidance through speculative reservation. In Proc. IEEE International Symposium on HighPerformance Comp Architecture, February 2012, pp.443-454. DOI:10.1109/HPCA.2012.6169047.
[31] Michelogiannakis G, Jiang N, Becker D et al. Channel reservation protocol for over-subscribed channels and destinations. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2013, Article No. 52. DOI:10.1145/2503210.2503213.
[32] Nan J, Dennison L, Dally W. Network endpoint congestion control for fine-grained communication. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 35. DOI:10.1145/2807591.2807600.
[33] Zeng G, Bai W, Chen G et al. Congestion control for cross-datacenter networks. In Proc. the 27th International Conference on Network Protocols, October 2019. DOI:10.1109/ICNP.2019.8888042.
[34] Zeng G, Bai W, Chen G et al. Combining ECN and RTT for datacenter transport. In Proc. the 1st Asia-Pacific Workshop on Networking, August 2017, pp.36-42. DOI:10.1145/3106989.3107002.
[1] Xiao-Dong Dong, Sheng Chen, Lai-Ping Zhao, Xiao-Bo Zhou, Heng Qi, Ke-Qiu Li. More Requests, Less Cost: Uncertain Inter-Datacenter Traffic Transmission with Multi-Tier Pricing [J]. Journal of Computer Science and Technology, 2018, 33(6): 1152-1163.
[2] Xue-Kai Du, Zhi-Hui Lu, Qiang Duan, Jie Wu, Cheng-Rong Wu. LTSS:Load-Adaptive Traffic Steering and Forwarding for Security Services in Multi-Tenant Cloud Datacenters [J]. , 2017, 32(6): 1265-1278.
[3] Tao Jiang, Rui Hou, Jian-Bo Dong, Lin Chai, Sally A. McKee, Bin Tian, Li-Xin Zhang, Ning-Hui Sun. Adapting Memory Hierarchies for Emerging Datacenter Interconnects [J]. , 2015, 30(1): 97-109.
[4] Li Chen, Baochun Li, Bo Li. Allocating Bandwidth in Datacenter Networks: A Survey [J]. , 2014, 29(5): 910-917.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Pan Qijing;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[4] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[5] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[7] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[8] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[9] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[10] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved