Journal of Computer Science and Technology ›› 2023, Vol. 38 ›› Issue (1): 87-102.doi: 10.1007/s11390-023-2885-7

Special Issue: Surveys; Computer Architecture and Systems

• Special Issue in Honor of Professor Kai Hwang’s 80th Birthday • Previous Articles     Next Articles

The Paradigm of Power Bounded High-Performance Computing

Rong Ge1, Xizhou Feng2, Pengfei Zou3, and Tyler Allen4        

  1. School of Computing, Clemson University, Clemson, SC 29634, U.S.A.
    Meta Platform, Inc., Menlo Park, CA 94025, U.S.A.
    Amazon, Inc., Seattle, WA 98170, U.S.A.
    University of North Carolina at Charlotte, NC 27599, U.S.A.
  • Received:2022-10-04 Revised:2022-10-26 Accepted:2023-01-02 Online:2023-02-28 Published:2023-02-28
  • Contact: Rong Ge E-mail:rge@clemson.edu
  • About author:Rong Ge received her B.S. and M.S. degrees in engineering mechanics from Tsinghua University, Beijing, in 1995 and 1998, respectively, and her Ph.D. degree in computer science at Virginia Tech, Washington, in 2007. She is the director of the Scalable Computing and Analytics Laboratory in the School of Computing at Clemson University, Clemson. Her research interest includes parallel and distributed systems, machine learning and big data, heterogeneous computing, and performance evaluation and optimization.
  • Supported by:
    This work is supported in part by the U.S. National Science Foundation under Grant Nos. CCF-1551511 and CNS-1551262.

Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers. To cope with this reality, it is necessary to understand how power bounds impact performance, especially for systems built from high-end nodes, each consisting of multiple power hungry components. Because placing an inappropriate power bound on a node or a component can lead to severe performance loss, coordinating power allocation among nodes and components is mandatory to achieve desired performance given a total power budget. In this article, we describe the paradigm of power bounded high-performance computing, which considers coordinated power bound assignment to be a key factor in computer system performance analysis and optimization. We apply this paradigm to the problem of power coordination across multiple layers for both CPU and GPU computing. Using several case studies, we demonstrate how the principles of balanced power coordination can be applied and adapted to the interplay of workloads, hardware technology, and the available total power for performance improvement.

Key words: power bounded computing; cross-component power coordination; hierarchical power allocation;

<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lucas R, Ang J, Bergman K et al. Top ten exascale research challenges. DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report, U.S. Department of Energy, Office of Science, 2014. DOI: <a href="http://dx.doi.org/10.2172/1222713.">10.2172/1222713</a>. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Jeon M, Venkataraman S, Phanishayee A, Qian J J, Xiao W C, Yang F. Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. In <i>Proc. the 2019 USENIX Annual Technical Conference</i>, Jul. 2019, pp.947-960. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X Z, Allen T, Zou P F. The case for cross-component power coordination on power bounded systems. <i>IEEE Trans. Parallel and Distributed Systems</i>, 2021, 32(10): 2464-2476. DOI: <a href="http://dx.doi.org/10.1109/TPDS.2021.3068235">10.1109/TPDS.2021.3068235</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X Z, He Y Y, Zou P F. The case for cross-component power coordination on power bounded systems. In <i>Proc. the 45th International Conference on Parallel Processing (ICPP)</i>, Aug. 2016, pp.516-525. DOI: <a href="http://dx.doi.org/10.1109/ICPP.2016.66">10.1109/ICPP.2016.66</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Zou P F, Feng X Z. Application-aware power coordination on power bounded NUMA multicore systems. In <i>Proc. the 46th International Conference on Parallel Processing (ICPP)</i>, Aug. 2017, pp.591-600. DOI: <a href="http://dx.doi.org/10.1109/ICPP.2017.68">10.1109/ICPP.2017.68</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zou P F, Allen T, Davis C H, Feng X Z, Ge R. CLIP: Cluster-level intelligent power coordination for power-bounded systems. In <i>Proc. the 2017 IEEE International Conference on Cluster Computing (CLUSTER)</i>, Sept. 2017, pp.541-551. DOI: <a href="http://dx.doi.org/10.1109/CLUSTER.2017.98">10.1109/CLUSTER.2017.98</a>. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zou P F, Feng X Z, Ge R. Contention aware workload and resource co-scheduling on power-bounded systems. In <i>Proc. the 2019 IEEE International Conference on Networking, Architecture and Storage (NAS)</i>, Aug. 2019. DOI: <a href="http://dx.doi.org/10.1109/NAS.2019.8834721">10.1109/NAS.2019.8834721</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zou P F, Rodriguez D, Ge R. Maximizing throughput on power-bounded HPC systems. In <i>Proc. the 2018 IEEE International Conference on Cluster Computing (CLUSTER)</i>, Sept. 2018, pp.156-157. DOI: <a href="http://dx.doi.org/10.1109/CLUSTER.2018.00030">10.1109/CLUSTER.2018.00030</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Eyerman S, Eeckhout L. System-level performance metrics for multiprogram workloads. <i>IEEE Micro</i>, 2008, 28(3): 42–53. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MM.2008.44" target="_blank">10.1109/MM.2008.44</a>. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Blagodurov S, Zhuravlev S, Fedorova A. Contention-aware scheduling on multicore systems. <i>ACM Trans. Computer Systems</i>, 2010, 28(4): Article No. 8. DOI: <a href="http://dx.doi.org/10.1145/1880018.1880019">10.1145/1880018.1880019</a>. </div> </td> </tr> <tr class="document-box" id="b11"> <td valign="top" class="td1"> [11] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In <i>Proc. the 48th Annual IEEE/ACM International Symposium on Microarchitecture</i>, Dec. 2015, pp.62-75. DOI: <a href="http://dx.doi.org/10.1145/2830772.2830803">10.1145/2830772.2830803</a>. </div> </td> </tr> <tr class="document-box" id="b12"> <td valign="top" class="td1"> [12] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kelley J, Stewart C, Tiwari D, Gupta S. Adaptive power profiling for many-core HPC architectures. In <i>Proc. the 2016 IEEE International Conference on Autonomic Computing (ICAC)</i>, Jul. 2016, pp.179-188. DOI: <a href="http://dx.doi.org/10.1109/ICAC.2016.45">10.1109/ICAC.2016.45</a>. </div> </td> </tr> <tr class="document-box" id="b13"> <td valign="top" class="td1"> [13] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Mishra N, Lafferty J D, Hoffmann H. ESP: A machine learning approach to predicting application interference. In <i>Proc. the 2017 IEEE International Conference on Autonomic Computing (ICAC)</i>, Jul. 2017, pp.125-134. DOI: <a href="http://dx.doi.org/10.1109/ICAC.2017.29">10.1109/ICAC.2017.29</a>. </div> </td> </tr> </tbody> </table>
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Li Wei;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[5] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[7] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[8] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[9] Pan Qijing;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[10] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved