We use cookies to improve your experience with our site.
Sa Wang, Yan-Hai Zhu, Shan-Pei Chen, Tian-Ze Wu, Wen-Jie Li, Xu-Sheng Zhan, Hai-Yang Ding, Wei-Song Shi, Yun-Gang Bao. A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks[J]. Journal of Computer Science and Technology, 2020, 35(1): 209-220. DOI: 10.1007/s11390-020-9732-x
Citation: Sa Wang, Yan-Hai Zhu, Shan-Pei Chen, Tian-Ze Wu, Wen-Jie Li, Xu-Sheng Zhan, Hai-Yang Ding, Wei-Song Shi, Yun-Gang Bao. A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks[J]. Journal of Computer Science and Technology, 2020, 35(1): 209-220. DOI: 10.1007/s11390-020-9732-x

A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks

  • Both resource efficiency and application QoS have been big concerns of datacenter operators for a long time, but remain to be irreconcilable. High resource utilization increases the risk of resource contention between co-located workload, which makes latency-critical (LC) applications suffer unpredictable, and even unacceptable performance. Plenty of prior work devotes the effort on exploiting effective mechanisms to protect the QoS of LC applications while improving resource efficiency. In this paper, we propose MAGI, a resource management runtime that leverages neural networks to monitor and further pinpoint the root cause of performance interference, and adjusts resource shares of corresponding applications to ensure the QoS of LC applications. MAGI is a practice in Alibaba datacenter to provide on-demand resource adjustment for applications using neural networks. The experimental results show that MAGI could reduce up to 87.3% performance degradation of LC application when co-located with other antagonist applications.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return