计算机科学技术学报 ›› 2023,Vol. 38 ›› Issue (1): 80-86.doi: 10.1007/s11390-022-2950-7

所属专题: 综述 Computer Architecture and Systems

• • 上一篇    下一篇

超越Amdahl定律的探索:大规模功率-性能测量和建模是如何驱动服务器和超级计算机的设计

  

  • 收稿日期:2022-11-04 修回日期:2022-12-30 接受日期:2022-12-30 出版日期:2023-02-28 发布日期:2023-02-28

Adventures Beyond Amdahl's Law: How Power-Performance Measurement and Modeling at Scale Drive Server and Supercomputer Design

Kirk W. Cameron, Fellow, IEEE, Distinguished Member, ACM        

  1. Computer Science, Virginia Tech, Washington D.C., VA 24061, U.S.A.
  • Received:2022-11-04 Revised:2022-12-30 Accepted:2022-12-30 Online:2023-02-28 Published:2023-02-28
  • Contact: Kirk W. Cameron E-mail:cameron@cs.vt.edu
  • About author:Kirk W. Cameron, Ph.D., is a professor of Computer Science at Virginia Tech and an IEEE Fellow. After 17 years at Virginia Tech in Blacksburg, as of August 2021, he is the inaugural faculty lead (i.e., department head) at Virginia Tech’s Innovation Campus in Alexandria, VA. In 2012–2022, he was the director of the stack@cs Center for Computer Systems (ranked #26 by US News in March 2022). As a researcher, he pioneered Green Computing and his power measurement and management techniques have had profound influence on the design of computers, supercomputers, and datacenters through commercialization and contributions to the design of the EnergyStar program for servers. His software has been downloaded by more than 500000 people in more than 160 countries and his work has appeared in The New York Times, The Guardian, Time, Newsweek, and elsewhere. His research artifacts have been exhibited nationally and internationally including at the Consumer Electronics Show in Las Vegas, Nevada, South by Southwest in Austin, TX, and the Smithsonian National Museum of American History, in Washington D.C.. Prof. Cameron has served as an associated editor for JPDC (Journal of Parallel and Distributed Computing) since 2015 and associate editor for IEEE TPDS (IEEE Transactions on Parallel and Distributed Systems) since 2018. He was also the inaugural Green IT columnist and associate editor for energy efficiency for IEEE Computer Magazine from 2010 until 2017.

Amdahl定律(阿姆达尔定律)意味着并行性是有限的,其潜在的加速比也是如此。Amdahl的贡献具有开创性,同时也非常重要,它让其他从事并行处理的学者更清晰地说明为什么大规模系统对我们的未来至关重要,以及它们如何从根本上提供了超越Amdahl预测的加速比的机会。在二十一世纪初,与Amdahl极为相似地,我们预测了由于功率的限制而导致的大规模系统的严峻后果。尽管我们早期的研究经常被忽视,部分学者仍清楚地意识到:功率终将限制性能。在本文的回顾中,我们讨论了大规模功率-性能测量和建模是如何在长达10多年里推动了服务器和超级计算机的设计。这些技术带来的影响在当前是毫无争议的,我们讨论了它们之间的联系、局限性,以及业界继续获得性能提升所必需的其它研究方向。

关键词: Amdahl定律, 加速比, 功率感知计算, 功率建模, 性能建模, 性能预测, 功率度量

Abstract: Amdahl’s Law painted a bleak picture for large-scale computing. The implication was that parallelism was limited and therefore so was potential speedup. While Amdahl's contribution was seminal and important, it drove others vested in parallel processing to define more clearly why large-scale systems are critical to our future and how they fundamentally provide opportunities for speedup beyond Amdahl’s predictions. In the early 2000s, much like Amdahl, we predicted dire consequences for large-scale systems due to power limits. While our early work was often dismissed, the implications were clear to some: power would ultimately limit performance. In this retrospective, we discuss how power-performance measurement and modeling at scale led to contributions that have driven server and supercomputer design for more than a decade. While the influence of these techniques is now indisputable, we discuss their connections, limits and additional research directions necessary to continue the performance gains our industry is accustomed to.

Key words: Amdahl’s Law, speedup, power-aware computing, power modeling, performance modeling, performance prediction, power measurement

<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In <i>Proc</i>. <i>the </i><i>Spring Joint Computer Conference</i>, Apr. 1967, pp.483–485. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Russell R M. The CRAY-1 computer system. <i>Communications of the ACM</i>, 1978, 21(1): 63–72. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1145/359327.359336" target="_blank">10.1145/359327.359336</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Robbins K A, Robbins S. The Cray X-MP/Model 24: A Case Study in Pipelined Architecture and Vector Processing. Springer, 1989. DOI: <a href="https://doi.org/10.1007/BFb0040661" target="_blank">10.1007/BFb0040661</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Gustafson J L. Reevaluating Amdahl’s law. <i>Communications of the ACM</i>, 1988, 31(5): 532–533. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1145/42411.42415" target="_blank">10.1145/42411.42415</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Ni L M. Scalable problems and memory-bounded speedup. <i>Journal of Parallel and Distributed Computing</i>, 1993, 19(1): 27–37. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1006/jpdc.1993.1087" target="_blank">10.1006/jpdc.1993.1087</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Cameron K W, Ge R. Predicting and evaluating distributed communication performance. In <i>Proc</i>. <i>the 2004 ACM/IEEE Conference on Supercomputing</i>, Nov. 2004, pp.43. DOI: <a href="https://doi.org/10.1109/SC.2004.40" target="_blank">10.1109/SC.2004.40</a>. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Cameron K W, Ge R, Feng X Z. High-performance, power-aware distributed computing for scientific applications. <i>Computer</i>, 2005, 38(11): 40–47. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MC.2005.380" target="_blank">10.1109/MC.2005.380</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Feng X, Ge R, Cameron K W. Power and energy profiling of scientific applications on distributed systems. In <i>Proc</i>. <i>the 19th IEEE International Parallel and Distributed Processing Symposium</i>, Apr. 2005, p.10. DOI: <a href="https://doi.org/10.1109/IPDPS.2005.346" target="_blank">10.1109/IPDPS.2005.346</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X, Cameron K W. Improvement of power-performance efficiency for high-end computing. In <i>Proc</i>. <i>the 19th IEEE International Parallel and Distributed Processing Symposium</i>, Apr. 2005, p.8. DOI: <a href="https://doi.org/10.1109/IPDPS.2005.251" target="_blank">10.1109/IPDPS.2005.251</a>. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X Z, Cameron K W. Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In <i>Proc</i>. <i>the 2005 ACM/IEEE Conference on Supercomputing</i>, Nov. 2005, p.34. DOI: <a href="https://doi.org/10.1109/SC.2005.57" target="_blank">10.1109/SC.2005.57</a>. </div> </td> </tr> <tr class="document-box" id="b11"> <td valign="top" class="td1"> [11] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X Z, Song S W, Chang H C, Li D, Cameron K W. PowerPack: Energy profiling and analysis of high-performance systems and applications. <i>IEEE Trans. Parallel and Distributed Systems</i>, 2010, 21(5): 658–671. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/TPDS.2009.76" target="_blank">10.1109/TPDS.2009.76</a>. </div> </td> </tr> <tr class="document-box" id="b12"> <td valign="top" class="td1"> [12] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Intel. Intel<span style="line-height:inherit;vertical-align:baseline;">®</span>64 and IA-32 architectures software developer manuals volume 3A: System programming guide, part 1. 2006. <a href="https://www.intel.cn/content/www/cn/zh/developer/articles/technical/intel-sdm.html" target="_blank">https://www.intel.cn/content/www/cn/zh/developer/articles/technical/intel-sdm.html</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b13"> <td valign="top" class="td1"> [13] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lohr S. Demand for data puts engineers in spotlight. <i>New York Times</i>, June 17, 2008. </div> </td> </tr> <tr class="document-box" id="b14"> <td valign="top" class="td1"> [14] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Dongarra J, Ltaief H, Luszczek P, Weaver V M. Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures. In <i>Proc. the 2nd International Conference on Cloud and Green Computing</i>, Nov. 2012, pp.274-281. DOI: <a href="https://doi.org/10.1109/CGC.2012.113" target="_blank">10.1109/CGC.2012.113</a>. </div> </td> </tr> <tr class="document-box" id="b15"> <td valign="top" class="td1"> [15] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lange K D. Identifying shades of green: The SPECpower benchmarks. <i>Computer</i>, 2009, 42(3): 95–97. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MC.2009.84" target="_blank">10.1109/MC.2009.84</a>. </div> </td> </tr> <tr class="document-box" id="b16"> <td valign="top" class="td1"> [16] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lange K D, Tricker M G. The design and development of the server efficiency rating tool (SERT). In <i>Proc</i>. <i>the 2nd ACM/SPEC International Conference on Performance Engineering</i>, Mar. 2011, pp.145-150. DOI: <a href="https://doi.org/10.1145/1958746.1958769" target="_blank">10.1145/1958746.1958769</a>. </div> </td> </tr> <tr class="document-box" id="b17"> <td valign="top" class="td1"> [17] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Feng W C, Cameron K. The Green500 List: Encouraging sustainable supercomputing. <i>Computer</i>, 2007, 40(12): 50–55. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MC.2007.445" target="_blank">10.1109/MC.2007.445</a>. </div> </td> </tr> <tr class="document-box" id="b18"> <td valign="top" class="td1"> [18] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill Science/Engineering/Math, 1992. </div> </td> </tr> <tr class="document-box" id="b19"> <td valign="top" class="td1"> [19] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Cameron K W. Power-aware speedup. In <i>Proc</i>. <i>the 2007 IEEE International Parallel and Distributed Processing Symposium</i>, Mar. 2007. DOI: <a href="https://doi.org/10.1109/IPDPS.2007.370246" target="_blank">10.1109/IPDPS.2007.370246</a>. </div> </td> </tr> <tr class="document-box" id="b20"> <td valign="top" class="td1"> [20] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Grama A Y, Gupta A, Kumar V. Isoefficiency: Measuring the scalability of parallel algorithms and architectures. <i>IEEE Parallel & Distributed Technology: Systems & Applications</i>, 1993, 1(3): 12–21. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/88.242438" target="_blank">10.1109/88.242438</a>. </div> </td> </tr> <tr class="document-box" id="b21"> <td valign="top" class="td1"> [21] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Song S W, Su C Y, Ge R, Vishnu A, Cameron K W. Iso-energy-efficiency: An approach to power-constrained parallel computation. In <i>Proc</i>. <i>the 2011 IEEE International Parallel & Distributed Processing Symposium</i>, May 2011, pp.128-139. DOI: <a href="https://doi.org/10.1109/IPDPS.2011.22" target="_blank">10.1109/IPDPS.2011.22</a>. </div> </td> </tr> <tr class="document-box" id="b22"> <td valign="top" class="td1"> [22] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Deng Q Y, Meisner D, Bhattacharjee A, Wenisch T F, Bianchini R. CoScale: Coordinating CPU and memory system DVFS in server systems. In <i>Proc</i>. <i>the 45th Annual IEEE/ACM International Symposium on Microarchitecture</i>, Dec. 2012, pp.143-154. DOI: <a href="https://doi.org/10.1109/MICRO.2012.22" target="_blank">10.1109/MICRO.2012.22</a>. </div> </td> </tr> <tr class="document-box" id="b23"> <td valign="top" class="td1"> [23] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Eyerman S, Eeckhout L. A counter architecture for online DVFS profitability estimation. <i>IEEE Trans. Computers</i>, 2010, 59(11): 1576–1583. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/TC.2010.65" target="_blank">10.1109/TC.2010.65</a>. </div> </td> </tr> <tr class="document-box" id="b24"> <td valign="top" class="td1"> [24] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Keramidas G, Spiliopoulos V, Kaxiras S. Interval-based models for run-time DVFS orchestration in superscalar processors. In <i>Proc</i>. <i>the 7th ACM International Conference on Computing Frontiers</i>, May 2010, pp.287-296. DOI: <a href="https://doi.org/10.1145/1787275.1787338" target="_blank">10.1145/1787275.1787338</a>. </div> </td> </tr> <tr class="document-box" id="b25"> <td valign="top" class="td1"> [25] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Rountree B, Lowenthal D K, Schulz M, De Supinski B R. Practical performance prediction under dynamic voltage frequency scaling. In <i>Proc</i>. <i>the 2011 International Green Computing Conference and Workshops</i>, Jul. 2011. DOI: <a href="https://doi.org/10.1109/IGCC.2011.6008553" target="_blank">10.1109/IGCC.2011.6008553</a>. </div> </td> </tr> <tr class="document-box" id="b26"> <td valign="top" class="td1"> [26] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Li B, León E A, Cameron K W. COS: A parallel performance model for dynamic variations in processor speed, memory speed, and thread concurrency. In <i>Proc</i>. <i>the 26th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2017, pp.155-166. DOI: <a href="https://doi.org/10.1145/3078597.3078601" target="_blank">10.1145/3078597.3078601</a>. </div> </td> </tr> <tr class="document-box" id="b27"> <td valign="top" class="td1"> [27] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> David H, Gorbatov E, Hanebutte U R, Khanna R, Le C. RAPL: Memory power estimation and capping. In <i>Proc</i>. <i>the 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design</i>, Aug. 2010, pp.189-194. DOI: <a href="https://doi.org/10.1145/1840845.1840883" target="_blank">10.1145/1840845.1840883</a>. </div> </td> </tr> </tbody> </table>
[1] . 内存制约加速比模型及其对计算的影响[J]. 计算机科学技术学报, 2023, 38(1): 64-79.
[2] Songjie Niu, Shimin Chen. TransGPerf:利用迁移学习建模分布式图计算性能[J]. 计算机科学技术学报, 2021, 36(4): 778-791.
[3] Jason Liu, Pedro Espina, Xian-He Sun. 关于储存系统建模和优化的综述[J]. 计算机科学技术学报, 2021, 36(1): 71-89.
[4] Quan Zhou, Liang Yang, Hui Cao. 应用于实时图像匹配的互相关可重构计算电路[J]. , 2017, 32(6): 1305-1318.
[5] Wei-Qing, Liu Jing Li. 面向仅有稀疏数据的移动云应用的一种自动化性能预测方法[J]. , 2017, 32(5): 936-956.
[6] Mei-Rong Li, Yin-Liang Zhao, You Tao, Qi-Ming Wang . 一种基于循环级并行的静态贪心和动态自适应线程激发方法[J]. , 2014, 29(6): 962-975.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李未;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[2] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[3] 王选; 吕之敏; 汤玉海; 向阳;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[4] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[6] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[8] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[9] 沈理; Stephen Y.H.Su;. Generalized Parallel Signature Analyzers with External Exclusive-OR Gates[J]. , 1986, 1(4): 49 -61 .
[10] 许小曙;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: