计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (4): 741-761.doi: 10.1007/s11390-021-1350-8

所属专题: Data Management and Data Mining

• • 上一篇    下一篇

WATuning:一种基于注意力机制的深度强化学习的工作负载感知调优系统

Jia-Ke Ge1,2, Yan-Feng Chai2,3, and Yun-Peng Chai1,2,*, Member, CCF   

  1. 1 Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education Renmin University of China, Beijing 100872, China;
    2 School of Information, Renmin University of China, Beijing 100872, China;
    3 College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030027, China
  • 收稿日期:2021-02-01 修回日期:2021-06-24 出版日期:2021-07-05 发布日期:2021-07-30
  • 通讯作者: Yun-Peng Chai E-mail:ypchai@ruc.edu.cn
  • 作者简介:Jia-Ke Ge received his B.E. degree in software engineering from Shanxi University, Taiyuan, in 2017, his M.S. degree in software engineering from Beijing University of Technology, Beijing, in 2020. He is currently a Ph.D. candidate with Renmin University of China, Beijing. His research interests include the intersection of key-value storage systems and machine learning.
  • 基金资助:
    This work was supported by the National Key Research and Development Program of China under Grant No. 2019YFE0198600 and the National Natural Science Foundation of China under Grant Nos. 61972402, 61972275, and 61732014.

WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning

Jia-Ke Ge1,2, Yan-Feng Chai2,3, and Yun-Peng Chai1,2,*, Member, CCF        

  1. 1 Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education Renmin University of China, Beijing 100872, China;
    2 School of Information, Renmin University of China, Beijing 100872, China;
    3 College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030027, China
  • Received:2021-02-01 Revised:2021-06-24 Online:2021-07-05 Published:2021-07-30
  • Contact: Yun-Peng Chai E-mail:ypchai@ruc.edu.cn
  • About author:Jia-Ke Ge received his B.E. degree in software engineering from Shanxi University, Taiyuan, in 2017, his M.S. degree in software engineering from Beijing University of Technology, Beijing, in 2020. He is currently a Ph.D. candidate with Renmin University of China, Beijing. His research interests include the intersection of key-value storage systems and machine learning.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China under Grant No. 2019YFE0198600 and the National Natural Science Foundation of China under Grant Nos. 61972402, 61972275, and 61732014.

配置调优对于优化存储系统(例如数据库、键值存储)的性能至关重要。高性能通常意味着高吞吐量和低延迟。目前,大多数系统的调优任务都是人工来完成的(例如,由DBA来完成)。但是在各种类型的存储系统和各种复杂环境中,专家们很难通过调优来实现高性能。近年来,自动调优在传统数据库系统中有一些研究,但这些方法都有一定的局限性。首先,基于规则的方法很难保证稳定的调优性能。其次,传统的机器学习方法要么依赖于以往高质量的调优经验和大量的训练样本,要么很难在有限的时间内找到更好的配置。最后,当前的强化学习方法也不能利用工作负载特性进行调优,这在实际应用环境中难以保证产生高性能配置。
针对以上问题,本文的研究目标是提出一种基于注意深度强化学习的调优系统WATuning,该系统能够适应工作负载特性的变化,在实际情况中高效有效地优化系统性能。首先,本文设计了WATuning的核心算法——基于注意力机制的深度强化学习算法,来实现系统的调优任务。该算法利用工作负载特征生成权重矩阵并作用于系统的内部度量,深度强化学习网络利用内部度量分配的权重值来选择合适的配置。其次,WATuning可以生成多个实例模型,针对不同类型的工作负载完成有针对性的推荐服务。最后,在实际应用中,WATuning还可以根据不断变化的工作负载动态自微调。使其可以学习当前环境并推荐适合当前环境的配置。
实验发现,WATuning利用工作负载的特性为系统推荐高性能旋钮的方法是正确的。在系统吞吐量性能方面,WATuning推荐的旋钮性能比默认旋钮高出200%,比DBA高出80%,比目前最佳调优系统CDBTune高出52%。同时,文中进行了许多维度的实验,结果都表明WATuning为各种工作负载下的系统提供了优秀的配置,极大地提高了系统吞吐量,并减少了延迟。WATuning优于当今最先进的调优方法。同时也表明了工作负载在调优过程中的重要性不容忽视。系统若想通过改变配置来提高性能,工作负载的特征一定扮演着重要角色。
说明:该文件将放在JCST网站上免费下载,以及用于国内其它宣传渠道,其目的是便于我国读者能更快速地了解论文的研究内容和贡献,从而有助于论文工作的传播和引用。论文摘要应具有独立性和自明性,使得其读者不阅读全文,就能获得必要的信息,也就是说,摘要是一种可以被引用的完整短文。此处的中文摘要,是长摘要,相对于论文的abstract,更为详细,而不是简单的翻译。此外,也不是论文各章节内容的罗列。以下为建议的提纲,请参照撰写论文中文长摘要。另外需提供的论文的Highlight(英文),也按照此提纲撰写。谢谢!
1、研究背景(context)。
2、目的(Objective):准确描述该研究的目的,说明提出问题的缘由,表明研究的范围和重要性。
3、方法(Method):简要说明研究课题的基本设计,结论是如何得到的。
4、结果(Result&Findings):简要列出该研究的主要结果,有什么新发现,说明其价值和局限。叙述要具体、准确,尽量给出量化数据而不只是定性描述,并给出结果的置信值(如果有)。
5、结论(Conclusions):简要地说明经验,论证取得的正确观点及理论价值或应用价值,是否还有与此有关的其它问题有待进一步研究,是否可推广应用,其应用价值如何?

关键词: 注意力机制, 自动调优系统, 强化学习, 工作负载感知

Abstract: Configuration tuning is essential to optimize the performance of systems (e.g., databases, key-value stores). High performance usually indicates high throughput and low latency. At present, most of the tuning tasks of systems are performed artificially (e.g., by database administrators), but it is hard for them to achieve high performance through tuning in various types of systems and in various environments. In recent years, there have been some studies on tuning traditional database systems, but all these methods have some limitations. In this article, we put forward a tuning system based on attention-based deep reinforcement learning named WATuning, which can adapt to the changes of workload characteristics and optimize the system performance efficiently and effectively. Firstly, we design the core algorithm named ATT-Tune for WATuning to achieve the tuning task of systems. The algorithm uses workload characteristics to generate a weight matrix and acts on the internal metrics of systems, and then ATT-Tune uses the internal metrics with weight values assigned to select the appropriate configuration. Secondly, WATuning can generate multiple instance models according to the change of the workload so that it can complete targeted recommendation services for different types of workloads. Finally, WATuning can also dynamically fine-tune itself according to the constantly changing workload in practical applications so that it can better fit to the actual environment to make recommendations. The experimental results show that the throughput and the latency of WATuning are improved by 52.6% and decreased by 31%, respectively, compared with the throughput and the latency of CDBTune which is an existing optimal tuning method.

Key words: attention mechanism, auto-tuning system, reinforcement learning (RL), workload-aware

[1] O'Neil P, Cheng E, Gawlick D, O'Neil E. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4):351-385. DOI:10.1007/s002360050048.
[2] Dong S Y, Callaghan M, Galanis L, Borthakur D, Savor T, Stumm M. Optimizing space amplification in RocksDB. In Proc. the 8th Biennial Conference on Innovative Data Systems Research, Jan. 2017.
[3] Chai Y P, Chai Y F, Wang X, Wei H C, Bao N, Liang Y S. LDC:A lower-level driven compaction method to optimize SSD-oriented key-value stores. In Proc. the 35th IEEE International Conference on Data Engineering, April 2019, pp.722-733. DOI:10.1109/ICDE.2019.00070.
[4] Chai Y P, Chai Y F, Wang X, Wei H C, Wang Y Y. Adaptive lower-level driven compaction to optimize LSM-Tree key-value stores. IEEE Transactions on Knowledge Data Engineering. DOI:10.1109/TKDE.2020.3019264.
[5] Zhu Y Q, Liu J X, Guo M Y, Bao Y G, Ma W L, Liu Z Y, Song K P, Yang Y C. BestConfig:Tapping the performance potential of systems via automatic configuration tuning. In Proc. ACM Symposium on Cloud Computing, Sept. 2017, pp.338-350. DOI:10.1145/3127479.3128605.
[6] Van Aken D, Pavlo A, Gordon G J, Zhang B H. Automatic database management system tuning through large-scale machine learning. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1009-1024. DOI:10.1145/3035918.3064029.
[7] Zhang J, Liu L, Ran M, Li Z K, Liu Y, Zhou K, Li G L, Xiao Z L, Cheng B, Xing J S, Wang Y T, Cheng T H. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proc. the 2019 International Conference on Management of Data, June 2019, pp.415-432. DOI:10.1145/3299869.3300085.
[8] Li G L, Zhou X H, Li S F, Gao B. QTune:A query-aware database tuning system with deep reinforcement learning. Proc. the VLDB Endowment, 2019, 12(12):2118-2130. DOI:10.14778/3352063.3352129.
[9] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. arXiv:1509.02971, 2015. https://arxiv.org/abs/1509.02971, Jun. 2021.
[10] Van Hasselt H. Double Q-learning. In Proc. the 24th Annual Conference on Neural Information Processing Systems, Dec. 2010, pp.2613-2621.
[11] Kingma D, Ba J. Adam:A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[12] Munos R, Moore A. Variable resolution discretization in optimal control. Machine Learning, 2002, 49(2/3):291-323. DOI:10.1023/A:1017992615625.
[13] Mnih V, Kavukcuoglu K, Silver D et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533. DOI:10.1038/nature14236.
[14] Ban T W. An autonomous transmission scheme using dueling DQN for D2D communication networks. IEEE Transactions on Vehicular Technology, 2020, 69(12):16348-16352. DOI:10.1109/TVT.2020.3041458.
[15] Chen L, Hu X M, Tang B, Cheng Y. Conditional DQNbased motion planning with fuzzy logic for autonomous driving. IEEE Transactions on Intelligent Transportation Systems. DOI:10.1109/TITS.2020.3025671.
[16] Huang H J, Yang Y C, Wang H, Ding Z G, Sari H, Adachi F. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Transactions on Vehicular Technology, 2020, 69(1):1117-1121. DOI:10.1109/TVT.2019.2952549.
[17] Li J X, Yao L, Xu X, Cheng B, Ren J K. Deep reinforcement learning for pedestrian collision avoidance and humanmachine cooperative driving. Information Sciences, 2020, 532:110-124. DOI:10.1016/j.ins.2020.03.105.
[18] Yoo H, Kim B, Kim J W, Lee J H. Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation. Computers & Chemical Engineering, 2021, 144:Article No. 107133. DOI:10.1016/j.compchemeng.2020.107133.
[19] He X M, Lu H D, Du M, Mao Y C, Wang K. QoE-based task offloading with deep reinforcement learning in edgeenabled Internet of vehicles. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(4):2252-2261. DOI:10.1109/TITS.2020.3016002.
[20] Li L Y, Xu H, Ma J, Zhou A Z. Joint EH time and transmit power optimization based on DDPG for EH communications. IEEE Communications Letters, 2020, 24(9):2043-2046. DOI:10.1109/LCOMM.2020.2999914.
[21] Nguyen D Q, Vien N A, Dang V H, Chung T. Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process. Applied Intelligence, 2020, 50(11):4050-4062. DOI:10.1007/s10489-020-01748-7.
[22] Gheisarnejad M, Khooban M H. IoT-based DC/DC deep learning power converter control:Real-time implementation. IEEE Transactions on Power Electronics, 2020, 35(12):13621-13630. DOI:10.1109/TPEL.2020.2993635.
[23] Tang Z T, Shao K, Zhao D B, Zhu Y H. Recent progress of deep reinforcement learning:From AlphaGo to AlphaGo Zero. Control Theory & Applications, 2017, 34(12):1529-1546. DOI:10.7641/CTA.2017.70808. (in Chinese)
[24] Silver D, Schrittwieser J, Simonyan K et al. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676):354-359. DOI:10.1038/nature24270.
[25] Ye D H, Chen G B, Zhang W et al. Towards playing full MOBA games with deep reinforcement learning. arXiv:2011.12692, 2020. https://arxiv.org/abs/2011.12692, Dec. 2020.
[26] Li G L. Human-in-the-loop data integration. Proceedings of the VLDB Endowment, 2017, 10(12):2006-2017. DOI:10.14778/3137765.3137833.
[27] Li G L, Zhou X H, Li S H. XuanYuan:An AI-native database. IEEE Data Engineering Bulletin, 2019, 42(2):70-81.
[28] Basu D, Lin Q, Chen W, Vo H T, Yuan Z, Senellart P, Bressan S. Regularized cost-model oblivious database tuning with reinforcement learning. In Transactions on LargeScale Data- and Knowledge-Centered Systems XXVⅢ, Hameurlain A, Küng J, Wagner R, Chen Q (eds.), Springer, 2016, pp.96-132. DOI:10.1007/978-3-662-53455-75.
[29] Sun J, Li G L. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 2019, 13(3):307-319. DOI:10.14778/3368289.3368296.
[30] Kraska T, Alizadeh M, Beutel A et al. SageDB:A learned database system. In Proc. the 9th Biennial Conference on Innovative Data Systems Research, Jan. 2019.
[31] Duan S Y, Thummala V, Babu S. Tuning database configuration parameters with iTuned. Proceedings of the VLDB Endowment, 2009, 2(1):1246-1257. DOI:10.14778/1687-627.1687767.
[32] Wei Z J, Ding Z H, Hu J L. Self-tuning performance of database systems based on fuzzy rules. In Proc. the 11th International Conference on Fuzzy Systems and Knowledge Discovery, Aug. 2014, pp.194-198. DOI:10.1109/FSKD.2014.6980831.
[33] Zheng C H, Ding Z H, Hu J L. Self-tuning performance of database systems with neural network. In Proc. the 10th International Conference on Natural Computation, Aug. 2014, pp.1-12. DOI:10.1007/978-3-319-09333-8_1.
[1] Qing-Bin Liu, Shi-Zhu He, Kang Liu, Sheng-Ping Liu, Jun Zhao. 一种用于对话状态跟踪的统一共享私有网络和去燥方法[J]. 计算机科学技术学报, 2021, 36(6): 1407-1419.
[2] Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. 基于动态偏度和稀疏度计算的A3C鲁棒性评估:一种并行计算视角[J]. 计算机科学技术学报, 2021, 36(5): 1002-1021.
[3] Chen-Chen Sun, De-Rong Shen. 面向深度实体匹配的混合层次网络[J]. 计算机科学技术学报, 2021, 36(4): 822-838.
[4] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. 基于深度学习的文本摘要研究综述[J]. 计算机科学技术学报, 2021, 36(3): 633-663.
[5] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. 婴儿的年龄和性别容易被识别吗?[J]. 计算机科学技术学报, 2021, 36(3): 508-519.
[6] Ying Li, Jia-Jie Xu, Peng-Peng Zhao, Jun-Hua Fang, Wei Chen, Lei Zhao. ATLRec:用于跨领域推荐的注意力对抗迁移学习网络[J]. 计算机科学技术学报, 2020, 35(4): 794-808.
[7] Yi-Ting Wang, Jie Shen, Zhi-Xu Li, Qiang Yang, An Liu, Peng-Peng Zhao, Jia-Jie Xu, Lei Zhao, Xun-Jie Yang. 基于搜索引擎丰富上下文信息的实体链接方法[J]. 计算机科学技术学报, 2020, 35(4): 724-738.
[8] Huan-Jing Yue, Sheng Shen, Jing-Yu Yang, Hao-Feng Hu, Yan-Fang Chen. 基于渐进式通道注意力网络的参考图引导超分辨率研究[J]. 计算机科学技术学报, 2020, 35(3): 551-563.
[9] Yan Zheng, Jian-Ye Hao, Zong-Zhang Zhang, Zhao-Peng Meng, Xiao-Tian Hao. 一种多智能体合作式环境下基于带权估计的策略优化算法[J]. 计算机科学技术学报, 2020, 35(2): 268-280.
[10] Chun-Yang Ruan, Ye Wang, Jiangang Ma, Yanchun Zhang, Xin-Tian Chen. 基于元路径注意力机制的异构网络对抗式嵌入[J]. 计算机科学技术学报, 2019, 34(6): 1217-1229.
[11] Ai-Wen Jiang, Bo Liu, Ming-Wen Wang. 基于上下文引导型循环注意机制与深度多模态强化网络的图像问答算法[J]. , 2017, 32(4): 738-748.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[5] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[8] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[9] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[10] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: