|
计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (1): 221-230.doi: 10.1007/s11390-019-1951-7
• • 上一篇
Yu-Qi Li1, Li-Quan Xiao2, Jing-Hua Feng1,2, Bin Xu1, Jian Zhang1
Yu-Qi Li1, Li-Quan Xiao2, Jing-Hua Feng1,2, Bin Xu1, Jian Zhang1
对天河一号(TH-1A)超级计算机系统的实际运行数据进行分析表明,冷冻水数据不仅能反映冷冻水系统的运行状况,而且可以反映超级计算机负载的变化。本研究提出了一种利用冷冻水压力和温度数据预测超级计算机负载和冷却系统故障的方法Aquasee。本方法中所使用的数据都是从部署在国家超级计算天津中心的TH-1A超级计算机系统收集获取的真实运行数据。本文首先使用网格搜索的方式选定合适的超参数集,然后通过利用不同成分的数据集建立预测模型来选择合适的数据集,通过测试不同的预测序列长度的预测效果来选择合适的预测序列长度。实验结果表明,采用压力和温度数据相结合的数据建立模型的方法比仅采用压力或温度数据的方法更有效,同时本文认为模型最佳预测序列长度为时间窗口外两分钟。此外,本方法还利用冷冻水数据建立了异常监测系统,以帮助工程师检测冷冻水系统异常。
[1] Yang X J, Liao X K, Lu K et al. The Tianhe-1A supercomputer:Its hardware and software. Journal of Computer Science and Technology, 2011, 26(3):344-351. [2] Sîrbu A, Babaoglu Ö. Towards a systematic ana-lysis of cluster computing log data:The case of IBM BlueGene/Q. arXiv:1410.4449v2, 2014. https://arxiv.org/pdf/1410.4449v2.pdf,June 2019. [3] Patnaik D, Marwah M, Sharma R K et al. Data mining for modeling chiller systems in data centers. In Proc. the 9th International Symposium on Intelligent Data Analysis, May 2010, pp.125-136. [4] Patnaik D, Marwah M, Sharma R K et al. Temporal data mining approaches for sustainable chiller management in data centers. ACM Transactions on Intelligent Systems and Technology, 2011, 2(4):Article No. 34. [5] Chou J S, Hsu Y C, Lin L T. Smart meter monitoring and data mining techniques for predicting refrigeration system performance. Expert Systems with Applications, 2014, 41(5):2144-2156. [6] Zapater M, Tuncer O, Ayala J L et al. Leakage-aware cooling management for improving server energy efficiency. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(10):2764-2777. [7] Dayarathna M, Wen Y, Fan R. Data center energy consumption modeling:A survey. IEEE Communications Surveys & Tutorials, 2017, 18(1):732-794. [8] Banerjee A, Mukherjee T, Varsamopoulos G et al. Coolingaware and thermal-aware workload placement for green HPC data centers. In Proc. the 2010 International Green Computing Conference, August 2010, pp.245-256. [9] Chen T, Wang X, Giannakis G B. Cooling-aware energy and workload management in data centers via stochastic optimization. IEEE Journal of Selected Topics in Signal Processing, 2016, 10(2):402-415. [10] Liu Z, Chen Y, Bash C et al. Renewable and cooling aware workload management for sustainable data centers. ACM SIGMETRICS Performance Evaluation Review, 2012, 40(1):175-186. [11] Li Y L, Wen Y G, Guan K, Tao D C. Transforming cooling optimization for green data center via deep reinforcement learning. IEEE Transactions on Cybernetics. doi:10.1109/TCYB.2019.2927410. [12] O'Brien K, Pietri I, Reddy R et al. A survey of power and energy predictive models in HPC systems and applications. ACM Computing Surveys, 2017, 50(3):Article No. 37. [13] Etinski M, Corbalán J, Labarta J et al. Utilization driven power-aware parallel job scheduling. Computer Science-Research and Development, 2010, 25(3-4):207-216. [14] Butts J A, Sohi G S. A static power model for architects. In Proc. the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, December 2000, pp.191-201. [15] Carbó A, Oró E, Salom J, Canuto M, Macías M, Guitart J. Experimental and numerical analysis for potential heat reuse in liquid cooled data centres. Energy Conversion and Management, 2016, 112:135-145. [16] Xu H, Feng C, Li B. Temperature aware workload management in geo-distributed data centers. ACM SIGMETRICS Performance Evaluation Review, 2013, 41(1):373-374. [17] Bates N J, Ghatikar G, Abdulla G et al. Electrical grid and supercomputing centers:An investigative analysis of emerging opportunities and challenges. Informatik Spektrum, 2015, 38(2):111-127. [18] Bai Y, Gu L, Qi X. Comparative study of energy performance between chip and inlet temperature-aware workload allocation in air-cooled data center. Energies, 2018, 11(3):Article No. 669. [19] Meng J, Mccauley S, Kaplan F, Leung V, Coskun A. Simulation and optimization of HPC job allocation for jointly reducing communication and cooling costs. Sustainable Computing:Informatics and Systems, 2015, 6:48-57. [20] Rahmani R, Moser I, Seyedmahmoudian M. A complete model for modular simulation of data centre power load. arXiv:1804.00703, 2018. https://arxiv.org/abs/1804.00703,June 2019. [21] Ranganathan P, Leech P, Irwin D et al. Ensemblelevel power management for dense blade servers. ACM SIGARCH Computer Architecture News, 2006, 34(2):66-77. [22] Hilburg J C S, Zapater M, Risco-Martín J L et al. Unsupervised power modeling of co-allocated workloads for energy efficiency in data centers. In Proc. the 2016 Design, Automation & Test in Europe Conference & Exhibition, March 2016, pp.1345-1350. [23] Sapankevych N I, Sankar R. Time series prediction using support vector machines:A survey. IEEE Computational Intelligence Magazine, 2009, 4(2):24-38. [24] Roy N, Dubey A, Gokhale A. Efficient autoscaling in the cloud using predictive models for workload forecasting. In Proc. the 4th IEEE International Conference on Cloud Computing, July 2011, pp.500-507. [25] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780. [26] Kumar J, Goomer R, Singh A K. Long short term memory recurrent neural network (LSTM-RNN) based workload forecasting model for cloud datacenters. Procedia Computer Science, 2018, 125:676-682. [27] Kong W, Dong Z Y, Jia Y et al. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Transactions on Smart Grid, 2019, 10(1):841-851. [28] Krstanovic S, Paulheim H. Ensembles of recurrent neural networks for robust time series forecasting. In Proc. the 37th SGAI International Conference on Artificial Intelligence, December 2017, pp.34-46. [29] Malhotra P, Vig L, Shroff G, Agarwal P. Long short term memory networks for anomaly detection in time series. In Proc. the 23rd European Symposium on Artificial Neural Networks, April 2015, Article No. 15. [30] Bontemps L, Cao V L, Mcdermott J et al. Collective anomaly detection based on long short term memory recurrent neural network. arXiv:1703.09752, 2017. https://arxiv.org/abs/1703.09752,June 2019. [31] Filonov P, Lavrentyev A, Vorontsov A. Multivariate industrial time series with cyber-attack simulation:Fault detection using an LSTM-based predictive data model. arXiv:1612.06676, 2016. https://arxiv.org/abs/1612.06676,June 2019. [32] Hundman K, Constantinou V, Laporte C et al. Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. In Proc. the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 2018, pp.387-395. [33] Wong C, Houlsby N, Lu Y et al. Transfer learning with Neural AutoML. arXiv:1803.02780v3, 2018. http://export.arxiv.org/abs/1803.02780v3,Aug.2019. |
[1] | 范兴刚, 车志聪, 胡凤丹, 刘涛, 徐金山, 周小龙. 有向传感器网络中基于目标圆的部署效率驱动的k-栅栏构建算法[J]. 计算机科学技术学报, 2020, 35(3): 647-664. |
[2] | Shou-Wan Gao, Peng-Peng Chen, Xu Yang, Qiang Niu. 基于竞争协议的不可靠无线网络多传感器估计[J]. 计算机科学技术学报, 2018, 33(5): 1072-1085. |
[3] | Rui-Tao Liu, Zuo-Ning Chen. P级超级计算机失效研究[J]. , 2018, 33(1): 24-41. |
[4] | Yawar Abbas Bangash, Ling-Fang Zeng, Dan Feng. MimiBS:模仿基站以在无线传感器网络中提供地址隐私保护[J]. , 2017, 32(5): 991-1007. |
[5] | Xiang-Ke Liao, Zheng-Bin Pang, Ke-Fei Wang, Yu-Tong Lu, Min Xie, Jun Xia, De-Zun Dong, Guang Suo. 天河高性能互连网络[J]. , 2015, 30(2): 259-272. |
[6] | Xiao-Long Zheng and Meng Wan. 无线传感器网络中数据分发方法综述[J]. , 2014, 29(3): 470-486. |
[7] | Xiang-Ke Liao, Can-Qun Yang, Tao Tang Hui-Zhan Yi, Feng Wang, Qiang Wu, Jingling. OpenMC:简化天河超级计算机的编程[J]. , 2014, 29(3): 532-546. |
[8] | 石海龙, 李栋, 邱杰凡, 侯陈达, 崔莉. 一种海云协同的任务执行框架[J]. , 2014, 29(2): 216-226. |
[9] | 龚征, Pieter Hartel, Svetla Nikova, 唐韶华, 朱博. TuLP:一类面向医疗传感器网络的轻量级消息认证码[J]. , 2014, 29(1): 53-68. |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |