We use cookies to improve your experience with our site.
Yu-Qi Li, Li-Quan Xiao, Jing-Hua Feng, Bin Xu, Jian Zhang. AquaSee: Predict Load and Cooling System Faults of Supercomputers Using Chilled Water Data[J]. Journal of Computer Science and Technology, 2020, 35(1): 221-230. DOI: 10.1007/s11390-019-1951-7
Citation: Yu-Qi Li, Li-Quan Xiao, Jing-Hua Feng, Bin Xu, Jian Zhang. AquaSee: Predict Load and Cooling System Faults of Supercomputers Using Chilled Water Data[J]. Journal of Computer Science and Technology, 2020, 35(1): 221-230. DOI: 10.1007/s11390-019-1951-7

AquaSee: Predict Load and Cooling System Faults of Supercomputers Using Chilled Water Data

  • An analysis of real-world operational data of Tianhe-1A (TH-1A) supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load. This study proposes AquaSee, a method that can predict the load and cooling system faults of supercomputers by using chilled water pressure and temperature data. This method is validated on the basis of real-world operational data of the TH-1A supercomputer system at the National Supercomputer Center in Tianjin. Datasets with various compositions are used to construct the prediction model, which is also established using different prediction sequence lengths. Experimental results show that the method that uses a combination of pressure and temperature data performs more effectively than that only consisting of either pressure or temperature data. The best inference sequence length is two points. Furthermore, an anomaly monitoring system is set up by using chilled water data to help engineers detect chiller system anomalies.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return