计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (1): 209-220.doi: 10.1007/s11390-020-9732-x

• • 上一篇    下一篇

基于多层感知网的阿里巴巴数据中心动态资源管理实践

Sa Wang1,2,3, Member, CCF, ACM, Yan-Hai Zhu4,*, Shan-Pei Chen4, Tian-Ze Wu1,2, Member, CCF, IEEE, Wen-Jie Li1,2, Xu-Sheng Zhan1,2, Hai-Yang Ding4, Wei-Song Shi5, Fellow, IEEE, Yun-Gang Bao1,2,3, Senior Member, CCF, Member, ACM, IEEE   

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Peng Cheng Laboratory, Shenzhen 518055, China;
    4 Alibaba Inc., Hangzhou 311121, China;
    5 Department of Computer Science, Wayne State University, Michigan, MI 48202, U.S.A
  • 收稿日期:2019-05-22 修回日期:2019-10-14 出版日期:2020-01-05 发布日期:2020-01-14
  • 通讯作者: Yan-Hai Zhu E-mail:gaoyang.zyh@taobao.com
  • 作者简介:Sa Wang received his B.S. degree from University of Science and Technology of China, Hefei, in 2009 and Ph.D. degree in computer science from the Chinese Academy of Sciences (CAS), Beijing, in 2016. He is an associate professor in ICT (Institute of Computing Technology), CAS. His current research interests include operating system, system performance evaluation and optimization, distributed system. He is a member of CCF and ACM.
  • 基金资助:
    This work is supported in part by the National Key Research and Development Program of China under Grant No. 2016YFB1000201, the National Natural Science Foundation of China under Grant Nos. 61420106013 and 61702480, and the Youth Innovation Promotion Association of Chinese Academy of Sciences and Alibaba Innovative Research (AIR) Program.

A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks

Sa Wang1,2,3, Member, CCF, ACM, Yan-Hai Zhu4,*, Shan-Pei Chen4, Tian-Ze Wu1,2, Member, CCF, IEEE, Wen-Jie Li1,2, Xu-Sheng Zhan1,2, Hai-Yang Ding4, Wei-Song Shi5, Fellow, IEEE, Yun-Gang Bao1,2,3, Senior Member, CCF, Member, ACM, IEEE        

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Peng Cheng Laboratory, Shenzhen 518055, China;
    4 Alibaba Inc., Hangzhou 311121, China;
    5 Department of Computer Science, Wayne State University, Michigan, MI 48202, U.S.A
  • Received:2019-05-22 Revised:2019-10-14 Online:2020-01-05 Published:2020-01-14
  • Contact: Yan-Hai Zhu E-mail:gaoyang.zyh@taobao.com
  • About author:Sa Wang received his B.S. degree from University of Science and Technology of China, Hefei, in 2009 and Ph.D. degree in computer science from the Chinese Academy of Sciences (CAS), Beijing, in 2016. He is an associate professor in ICT (Institute of Computing Technology), CAS. His current research interests include operating system, system performance evaluation and optimization, distributed system. He is a member of CCF and ACM.
  • Supported by:
    This work is supported in part by the National Key Research and Development Program of China under Grant No. 2016YFB1000201, the National Natural Science Foundation of China under Grant Nos. 61420106013 and 61702480, and the Youth Innovation Promotion Association of Chinese Academy of Sciences and Alibaba Innovative Research (AIR) Program.

研究目的: 资源利用率和服务质量是数据中心资源管理所面临的两个难以调和的关键问题。一方面数据中心需要管理大量延迟敏感的在线应用,他们对服务尾延迟要求非常严格,但资源利用率较低,导致数据中心整体资源利用率偏低,造成大量的资源浪费。然而将延迟敏感型应用与计算类型应用混合部署拉高整体资源利用率的时候,大量研究和实践发现,延迟敏感类型的应用会出现不同程度的性能波动,严重影响服务质量。这一问题普遍存在于当前主流云计算平台,如谷歌、亚马逊、阿里巴巴等,至今仍难以完全解决,服务提供商只能通过维持较低的资源利用率来优先保障在线应用的服务质量。本文尝试研究一种资源管理框架Magi,能够实现混合部署拉高资源利用率的同时也能保障在线应用服务质量。
研究方法: 混合部署带来的性能波动归根结底是混部应用无序共享底层硬件资源造成的,因此一个最直观的方法就是通过细粒度资源隔离机制(如CPU动态调整、共享缓存划分、内存带宽划分等)保障在线应用在不同共享资源上的资源需求即可,然而在线应用不同时刻的资源需求仍然是一个难以精确量化的问题,因此根据在线应用的实时性能波动进行反馈调节成为了一种行之有效的方法。而在线反馈调节面临两个重要挑战:1)何时调?如何区分当前应用是正常性能波动还是受到了性能干扰?2)怎么调?如何确定当前应用在哪个资源产生竞争从而造成性能干扰?由于多级共享资源造成应用性能干扰成因复杂,我们引入了多层感知网模型,建立各类资源实时使用状态、共享应用运行状态、系统整体环境状态等因素与当前监控在线应用性能之间的关系。通过多层感知网模型,当发生应用性能波动,我们可以通过模型追溯造成当前性能波动的因素,如果造成性能波动的原因追溯到应用自身因素,则认为是正常波动,当追溯到系统环境中的其他因素,则认定为造成性能干扰的关键瓶颈资源,对该应用在关键瓶颈资源进行资源扩充并隔离,保障应用的服务质量。
结果: 实验结果表明,随着共享应用的不断增加,Magi可以使得在线应用Xapian平均延迟和尾延迟均保持稳定,比无序竞争场景的尾延迟降低2-4倍,尾延迟性能波动从(1.53%~130.53%)降低到了(3.15%~9.51%),同时能够维持较高的资源利用率。但Magi保障下的资源利用率相对于无序竞争还是下降了43%左右,由此可知Magi对干扰应用的资源限制也一定程度上遏制了资源利用率的提升。
结论: 本文提出一种基于多层感知网模型的数据中心动态资源管理框架,能够在混部环境下有效保障在线应用的尾延迟的同时提高整体资源利用率。实验表明多层感知网模型可以有效建立混部环境下复杂因素与应用性能之间的关系,协助进行动态资源调整。但由于需要额外离线训练同时泛化能力有限,多层感知网离实际生产环境中还存在一定距离,需要进一步研究优化。

关键词: 资源管理, 神经网络, 资源有效性, 尾延迟

Abstract: Both resource efficiency and application QoS have been big concerns of datacenter operators for a long time, but remain to be irreconcilable. High resource utilization increases the risk of resource contention between co-located workload, which makes latency-critical (LC) applications suffer unpredictable, and even unacceptable performance. Plenty of prior work devotes the effort on exploiting effective mechanisms to protect the QoS of LC applications while improving resource efficiency. In this paper, we propose MAGI, a resource management runtime that leverages neural networks to monitor and further pinpoint the root cause of performance interference, and adjusts resource shares of corresponding applications to ensure the QoS of LC applications. MAGI is a practice in Alibaba datacenter to provide on-demand resource adjustment for applications using neural networks. The experimental results show that MAGI could reduce up to 87.3% performance degradation of LC application when co-located with other antagonist applications.

Key words: resource management, neural network, resource efficiency, tail latency

[1] Reiss C, Tumanov A, Ganger G R, Katz R H, Kozuch M A. Heterogeneity and dynamicity of clouds at scale:Google trace analysis. In Proc. the 3rd ACM Symposium on Cloud Computing, October 2012, Article No. 7.
[2] Liu H. A measurement study of server utilization in public clouds. In Proc. the 9th IEEE International Conference on Dependable, Autonomic and Secure Computing, December 2011, pp.435-442.
[3] Delimitrou C, Kozyrakis C. Quasar:Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49(4):127-144.
[4] Cortez E, Bonde A, Muzio A, Russinovich M, Fontoura M, Bianchini R. Resource central:Understanding and predicting workloads for improved resource management in large cloud platforms. In Proc. the 26th Symposium on Operating Systems Principles, October 2017, pp.153-167.
[5] Lo D, Cheng L Q, Govindaraju R, Ranganathan P, Kozyrakis C. Heracles:Improving resource efficiency at scale. ACM SIGARCH Computer Architecture News, 2015, 43:450-462.
[6] Chen S, Delimitrou C, Martínez J F. PARTIES:QoS-aware resource partitioning for multiple interactive services. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, April 2019, pp.107-120.
[7] Zhuravlev S, Blagodurov S, Fedorova A. Addressing shared resource contention in multicore processors via scheduling. ACM SIGPLAN Notices, 2010, 45:129-142.
[8] Zhang X, Tune E, Hagmann R et al. CPI2:CPU performance isolation for shared compute clusters. In Proc. the 8th ACM European Conference on Computer Systems, April 2013, pp.379-391.
[9] Yasin A. A top-down method for performance analysis and counters architecture. In Proc. the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, March 2014, pp.35-44.
[10] Kasture H, Sanchez D. Tailbench:A benchmark suite and evaluation methodology for latency-critical applications. In Proc. the 2016 IEEE International Symposium on Workload Characterization, September 2016, pp.3-12.
[11] Henning J L. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006, 34(4):1-17.
[12] Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J. Large-scale cluster management at Google with Borg. In Proc. the 10th European Conference on Computer Systems, April 2015, Article No. 18.
[13] Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph A D, Katz R H, Shenker S, Stoica I. Mesos:A platform for fine-grained resource sharing in the data center. In Proc. the 8th USENIX Symposium on Networked Systems Design and Implementation, March 2011, Article No. 4.
[14] Schwarzkopf M, Konwinski A, Abd-El-Malek M, Wilkes J. Omega:Flexible, scalable schedulers for large compute clusters. In Proc. the 8th ACM European Conference on Computer Systems, April 2013, pp.351-364.
[15] Ousterhout K, Wendell P, Zaharia M, Stoica I. Sparrow:Distributed, low latency scheduling. In Proc. the 24th ACM Symposium on Operating Systems Principles, November 2013, pp.69-84.
[16] Zhang Z, Li C, Tao Y Y, Yang R Y, Tang H, Xu J. Fuxi:A fault-tolerant resource management and job scheduling system at Internet scale. Proceedings of the VLDB Endowment, 2014, 7(13):1393-1404.
[17] Guo J, Chang Z H, Wang S, Ding H Y, Feng Y H, Mao L, Bao Y G. Who limits the resource efficiency of my datacenter:An analysis of Alibaba datacenter traces. In Proc. the International Symposium on Quality of Service, June 2019, Article No. 39.
[18] Herdrich A, Verplanke E, Autee P, Illikkal R, Gianos C, Singhal R, Iyer R. Cache QoS:From concept to reality in the intelr Xeonr processor E5-2600 v3 product family. In Proc. the 2016 IEEE International Symposium on High Performance Computer Architecture, March 2016, pp.657-668.
[1] 魏华鹏, 邓盈盈, 唐帆, 潘兴甲, 董未名. 基于卷积神经网络和Transformer的视觉风格迁移的比较研究[J]. 计算机科学技术学报, 2022, 37(3): 601-614.
[2] 陈铮、方晓楠、张松海. 少纹理区域的局部单应性矩阵估计[J]. 计算机科学技术学报, 2022, 37(3): 615-625.
[3] 解晓政, 牛建伟, 刘雪峰, 李青锋, 王勇, 韩洁, 唐少杰. 基于卷积神经网络并融合边界信息的乳腺癌超声图像诊断[J]. 计算机科学技术学报, 2022, 37(2): 277-294.
[4] 王新峰、周翔、饶家华、张柱金、杨跃东. 基于迁移学习的DNA甲基化缺失数据补齐[J]. 计算机科学技术学报, 2022, 37(2): 320-329.
[5] 张鑫, 陆思源, 王水花, 余翔, 王甦菁, 姚仑, 潘毅, 张煜东. 通过新型深度学习架构诊断COVID-19肺炎[J]. 计算机科学技术学报, 2022, 37(2): 330-343.
[6] Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen. 预训练和学习:在图神经网络中保留全局信息[J]. 计算机科学技术学报, 2021, 36(6): 1420-1430.
[7] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. 机器学习辅助的抗逻辑块加密密钥猜测攻击范式[J]. 计算机科学技术学报, 2021, 36(5): 1102-1117.
[8] Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. 在忆阻器中基于模式表示法的二值神经网络权重映射法[J]. 计算机科学技术学报, 2021, 36(5): 1155-1166.
[9] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. 基数估计器:利用垂直扫描卷积神经网络处理SQL[J]. 计算机科学技术学报, 2021, 36(4): 762-777.
[10] Chen-Chen Sun, De-Rong Shen. 面向深度实体匹配的混合层次网络[J]. 计算机科学技术学报, 2021, 36(4): 822-838.
[11] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. 婴儿的年龄和性别容易被识别吗?[J]. 计算机科学技术学报, 2021, 36(3): 508-519.
[12] Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang, Qing Shen. 基于卷积神经网络的实时多阶段斑马鱼头部姿态估计框架[J]. 计算机科学技术学报, 2021, 36(2): 434-444.
[13] Bo-Wei Zou, Rong-Tao Huang, Zeng-Zhuang Xu, Yu Hong, Guo-Dong Zhou. 基于对抗神经网络的跨语言实体关系分类[J]. 计算机科学技术学报, 2021, 36(1): 207-220.
[14] Wan-Wei Liu, Fu Song, Tang-Hao-Ran Zhang, Ji Wang. 基于模型检验的ReLU神经网络验证[J]. 计算机科学技术学报, 2020, 35(6): 1365-1381.
[15] Bi-Ying Yan, Chao Yang, Pan Deng, Qiao Sun, Feng Chen, Yang Yu. 一种基于时空因果性的城市感知数据治理方法[J]. 计算机科学技术学报, 2020, 35(5): 1084-1098.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[3] 孙钟秀; 商陆军;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[4] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[5] 屈延文;. AGDL: A Definition Language for Attribute Grammars[J]. , 1986, 1(3): 80 -91 .
[6] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[7] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[8] 卢学妙;. On the Complexity of Induction of Structural Descriptions[J]. , 1987, 2(1): 12 -21 .
[9] 衷仁保; 邢林; 任朝阳;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
[10] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: