Journal of Computer Science and Technology ›› 2023, Vol. 38 ›› Issue (1): 80-86.doi: 10.1007/s11390-022-2950-7

Special Issue: Surveys; Computer Architecture and Systems

• Special Issue in Honor of Professor Kai Hwang’s 80th Birthday • Previous Articles     Next Articles

Adventures Beyond Amdahl's Law: How Power-Performance Measurement and Modeling at Scale Drive Server and Supercomputer Design

Kirk W. Cameron, Fellow, IEEE, Distinguished Member, ACM        

  1. Computer Science, Virginia Tech, Washington D.C., VA 24061, U.S.A.
  • Received:2022-11-04 Revised:2022-12-30 Accepted:2022-12-30 Online:2023-02-28 Published:2023-02-28
  • Contact: Kirk W. Cameron E-mail:cameron@cs.vt.edu
  • About author:Kirk W. Cameron, Ph.D., is a professor of Computer Science at Virginia Tech and an IEEE Fellow. After 17 years at Virginia Tech in Blacksburg, as of August 2021, he is the inaugural faculty lead (i.e., department head) at Virginia Tech’s Innovation Campus in Alexandria, VA. In 2012–2022, he was the director of the stack@cs Center for Computer Systems (ranked #26 by US News in March 2022). As a researcher, he pioneered Green Computing and his power measurement and management techniques have had profound influence on the design of computers, supercomputers, and datacenters through commercialization and contributions to the design of the EnergyStar program for servers. His software has been downloaded by more than 500000 people in more than 160 countries and his work has appeared in The New York Times, The Guardian, Time, Newsweek, and elsewhere. His research artifacts have been exhibited nationally and internationally including at the Consumer Electronics Show in Las Vegas, Nevada, South by Southwest in Austin, TX, and the Smithsonian National Museum of American History, in Washington D.C.. Prof. Cameron has served as an associated editor for JPDC (Journal of Parallel and Distributed Computing) since 2015 and associate editor for IEEE TPDS (IEEE Transactions on Parallel and Distributed Systems) since 2018. He was also the inaugural Green IT columnist and associate editor for energy efficiency for IEEE Computer Magazine from 2010 until 2017.

Amdahl’s Law painted a bleak picture for large-scale computing. The implication was that parallelism was limited and therefore so was potential speedup. While Amdahl's contribution was seminal and important, it drove others vested in parallel processing to define more clearly why large-scale systems are critical to our future and how they fundamentally provide opportunities for speedup beyond Amdahl’s predictions. In the early 2000s, much like Amdahl, we predicted dire consequences for large-scale systems due to power limits. While our early work was often dismissed, the implications were clear to some: power would ultimately limit performance. In this retrospective, we discuss how power-performance measurement and modeling at scale led to contributions that have driven server and supercomputer design for more than a decade. While the influence of these techniques is now indisputable, we discuss their connections, limits and additional research directions necessary to continue the performance gains our industry is accustomed to.

Key words: Amdahl’s Law; speedup; power-aware computing; power modeling; performance modeling; performance prediction; power measurement;

<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In <i>Proc</i>. <i>the </i><i>Spring Joint Computer Conference</i>, Apr. 1967, pp.483–485. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Russell R M. The CRAY-1 computer system. <i>Communications of the ACM</i>, 1978, 21(1): 63–72. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1145/359327.359336" target="_blank">10.1145/359327.359336</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Robbins K A, Robbins S. The Cray X-MP/Model 24: A Case Study in Pipelined Architecture and Vector Processing. Springer, 1989. DOI: <a href="https://doi.org/10.1007/BFb0040661" target="_blank">10.1007/BFb0040661</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Gustafson J L. Reevaluating Amdahl’s law. <i>Communications of the ACM</i>, 1988, 31(5): 532–533. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1145/42411.42415" target="_blank">10.1145/42411.42415</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Ni L M. Scalable problems and memory-bounded speedup. <i>Journal of Parallel and Distributed Computing</i>, 1993, 19(1): 27–37. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1006/jpdc.1993.1087" target="_blank">10.1006/jpdc.1993.1087</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Cameron K W, Ge R. Predicting and evaluating distributed communication performance. In <i>Proc</i>. <i>the 2004 ACM/IEEE Conference on Supercomputing</i>, Nov. 2004, pp.43. DOI: <a href="https://doi.org/10.1109/SC.2004.40" target="_blank">10.1109/SC.2004.40</a>. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Cameron K W, Ge R, Feng X Z. High-performance, power-aware distributed computing for scientific applications. <i>Computer</i>, 2005, 38(11): 40–47. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MC.2005.380" target="_blank">10.1109/MC.2005.380</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Feng X, Ge R, Cameron K W. Power and energy profiling of scientific applications on distributed systems. In <i>Proc</i>. <i>the 19th IEEE International Parallel and Distributed Processing Symposium</i>, Apr. 2005, p.10. DOI: <a href="https://doi.org/10.1109/IPDPS.2005.346" target="_blank">10.1109/IPDPS.2005.346</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X, Cameron K W. Improvement of power-performance efficiency for high-end computing. In <i>Proc</i>. <i>the 19th IEEE International Parallel and Distributed Processing Symposium</i>, Apr. 2005, p.8. DOI: <a href="https://doi.org/10.1109/IPDPS.2005.251" target="_blank">10.1109/IPDPS.2005.251</a>. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X Z, Cameron K W. Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In <i>Proc</i>. <i>the 2005 ACM/IEEE Conference on Supercomputing</i>, Nov. 2005, p.34. DOI: <a href="https://doi.org/10.1109/SC.2005.57" target="_blank">10.1109/SC.2005.57</a>. </div> </td> </tr> <tr class="document-box" id="b11"> <td valign="top" class="td1"> [11] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Feng X Z, Song S W, Chang H C, Li D, Cameron K W. PowerPack: Energy profiling and analysis of high-performance systems and applications. <i>IEEE Trans. Parallel and Distributed Systems</i>, 2010, 21(5): 658–671. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/TPDS.2009.76" target="_blank">10.1109/TPDS.2009.76</a>. </div> </td> </tr> <tr class="document-box" id="b12"> <td valign="top" class="td1"> [12] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Intel. Intel<span style="line-height:inherit;vertical-align:baseline;">®</span>64 and IA-32 architectures software developer manuals volume 3A: System programming guide, part 1. 2006. <a href="https://www.intel.cn/content/www/cn/zh/developer/articles/technical/intel-sdm.html" target="_blank">https://www.intel.cn/content/www/cn/zh/developer/articles/technical/intel-sdm.html</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b13"> <td valign="top" class="td1"> [13] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lohr S. Demand for data puts engineers in spotlight. <i>New York Times</i>, June 17, 2008. </div> </td> </tr> <tr class="document-box" id="b14"> <td valign="top" class="td1"> [14] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Dongarra J, Ltaief H, Luszczek P, Weaver V M. Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures. In <i>Proc. the 2nd International Conference on Cloud and Green Computing</i>, Nov. 2012, pp.274-281. DOI: <a href="https://doi.org/10.1109/CGC.2012.113" target="_blank">10.1109/CGC.2012.113</a>. </div> </td> </tr> <tr class="document-box" id="b15"> <td valign="top" class="td1"> [15] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lange K D. Identifying shades of green: The SPECpower benchmarks. <i>Computer</i>, 2009, 42(3): 95–97. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MC.2009.84" target="_blank">10.1109/MC.2009.84</a>. </div> </td> </tr> <tr class="document-box" id="b16"> <td valign="top" class="td1"> [16] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lange K D, Tricker M G. The design and development of the server efficiency rating tool (SERT). In <i>Proc</i>. <i>the 2nd ACM/SPEC International Conference on Performance Engineering</i>, Mar. 2011, pp.145-150. DOI: <a href="https://doi.org/10.1145/1958746.1958769" target="_blank">10.1145/1958746.1958769</a>. </div> </td> </tr> <tr class="document-box" id="b17"> <td valign="top" class="td1"> [17] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Feng W C, Cameron K. The Green500 List: Encouraging sustainable supercomputing. <i>Computer</i>, 2007, 40(12): 50–55. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/MC.2007.445" target="_blank">10.1109/MC.2007.445</a>. </div> </td> </tr> <tr class="document-box" id="b18"> <td valign="top" class="td1"> [18] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill Science/Engineering/Math, 1992. </div> </td> </tr> <tr class="document-box" id="b19"> <td valign="top" class="td1"> [19] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ge R, Cameron K W. Power-aware speedup. In <i>Proc</i>. <i>the 2007 IEEE International Parallel and Distributed Processing Symposium</i>, Mar. 2007. DOI: <a href="https://doi.org/10.1109/IPDPS.2007.370246" target="_blank">10.1109/IPDPS.2007.370246</a>. </div> </td> </tr> <tr class="document-box" id="b20"> <td valign="top" class="td1"> [20] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Grama A Y, Gupta A, Kumar V. Isoefficiency: Measuring the scalability of parallel algorithms and architectures. <i>IEEE Parallel & Distributed Technology: Systems & Applications</i>, 1993, 1(3): 12–21. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/88.242438" target="_blank">10.1109/88.242438</a>. </div> </td> </tr> <tr class="document-box" id="b21"> <td valign="top" class="td1"> [21] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Song S W, Su C Y, Ge R, Vishnu A, Cameron K W. Iso-energy-efficiency: An approach to power-constrained parallel computation. In <i>Proc</i>. <i>the 2011 IEEE International Parallel & Distributed Processing Symposium</i>, May 2011, pp.128-139. DOI: <a href="https://doi.org/10.1109/IPDPS.2011.22" target="_blank">10.1109/IPDPS.2011.22</a>. </div> </td> </tr> <tr class="document-box" id="b22"> <td valign="top" class="td1"> [22] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Deng Q Y, Meisner D, Bhattacharjee A, Wenisch T F, Bianchini R. CoScale: Coordinating CPU and memory system DVFS in server systems. In <i>Proc</i>. <i>the 45th Annual IEEE/ACM International Symposium on Microarchitecture</i>, Dec. 2012, pp.143-154. DOI: <a href="https://doi.org/10.1109/MICRO.2012.22" target="_blank">10.1109/MICRO.2012.22</a>. </div> </td> </tr> <tr class="document-box" id="b23"> <td valign="top" class="td1"> [23] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Eyerman S, Eeckhout L. A counter architecture for online DVFS profitability estimation. <i>IEEE Trans. Computers</i>, 2010, 59(11): 1576–1583. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/TC.2010.65" target="_blank">10.1109/TC.2010.65</a>. </div> </td> </tr> <tr class="document-box" id="b24"> <td valign="top" class="td1"> [24] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Keramidas G, Spiliopoulos V, Kaxiras S. Interval-based models for run-time DVFS orchestration in superscalar processors. In <i>Proc</i>. <i>the 7th ACM International Conference on Computing Frontiers</i>, May 2010, pp.287-296. DOI: <a href="https://doi.org/10.1145/1787275.1787338" target="_blank">10.1145/1787275.1787338</a>. </div> </td> </tr> <tr class="document-box" id="b25"> <td valign="top" class="td1"> [25] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Rountree B, Lowenthal D K, Schulz M, De Supinski B R. Practical performance prediction under dynamic voltage frequency scaling. In <i>Proc</i>. <i>the 2011 International Green Computing Conference and Workshops</i>, Jul. 2011. DOI: <a href="https://doi.org/10.1109/IGCC.2011.6008553" target="_blank">10.1109/IGCC.2011.6008553</a>. </div> </td> </tr> <tr class="document-box" id="b26"> <td valign="top" class="td1"> [26] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Li B, León E A, Cameron K W. COS: A parallel performance model for dynamic variations in processor speed, memory speed, and thread concurrency. In <i>Proc</i>. <i>the 26th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2017, pp.155-166. DOI: <a href="https://doi.org/10.1145/3078597.3078601" target="_blank">10.1145/3078597.3078601</a>. </div> </td> </tr> <tr class="document-box" id="b27"> <td valign="top" class="td1"> [27] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> David H, Gorbatov E, Hanebutte U R, Khanna R, Le C. RAPL: Memory power estimation and capping. In <i>Proc</i>. <i>the 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design</i>, Aug. 2010, pp.189-194. DOI: <a href="https://doi.org/10.1145/1840845.1840883" target="_blank">10.1145/1840845.1840883</a>. </div> </td> </tr> </tbody> </table>
[1] Xian-He Sun and Xiaoyang Lu. The Memory-Bounded Speedup Model and Its Impacts in Computing [J]. Journal of Computer Science and Technology, 2023, 38(1): 64-79.
[2] Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance [J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791.
[3] Jason Liu, Pedro Espina, Xian-He Sun. A Study on Modeling and Optimization of Memory Systems [J]. Journal of Computer Science and Technology, 2021, 36(1): 71-89.
[4] De-Fu Lian, Qi Liu. Jointly Recommending Library Books and Predicting Academic Performance: A Mutual Reinforcement Perspective [J]. , 2018, 33(4): 654-667.
[5] Wei-Qing, Liu Jing Li. An Approach to Automatic Performance Prediction for Cloud-enhanced Mobile Applications with Sparse Data [J]. , 2017, 32(5): 936-956.
[6] Mei-Rong Li, Yin-Liang Zhao, You Tao, Qi-Ming Wang . A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism [J]. , 2014, 29(6): 962-975.
[7] Ling Li (李玲), Member, CCF, ACM, IEEE, Yun-Ji Chen (陈云霁), Member, CCF, ACM, Dao-Fu Liu (刘道福), Cheng Qian (钱诚), and Wei-Wu Hu (胡伟武), Member, IEEE. An FFT Performance Model for Optimizing General-Purpose Processor Architecture [J]. , 2011, 26(5): 875-889.
[8] Yu Dai, Member, CCF, Lei Yang, and Bin Zhang. QoS-Driven Self-Healing Web Service Composition Based on Performance Prediction [J]. , 2009, 24(2): 250-261.
[9] Hao Lang, Bin Wang, Gareth Jones, Jin-Tao Li, Fan Ding, and Yi-Xuan Liu. Query Performance Prediction for Information Retrieval Based on Covering Topic Score [J]. , 2008, 23(4 ): 590-601 .
[10] GUO Qingping; Yakup Paker; ZHANG Shesheng; Dennis Parkinson; WEI Jianing;. Optimum Tactics of Parallel Multi-Grid Algorithm with Virtual Boundary Forecast Method Running on a Local Network with the PVM Platform [J]. , 2000, 15(4): 355-359.
[11] GUO Qingping(郭庆平),Yakup Paker,ZHANG Shesheng(章社生),Dennis Parkinson and WEI Jianing(卫佳宁). Optimum Tactics of Parallel Multi-Grid Algorithm with Virtual Boundary Forecast Method Running on a Local Network with the PVM Platform [J]. , 2000, 15(4): 0-0.
[12] SHI Weisong; HU weiwu; TANG Zhimin;. Where Does the Time Go in Software DSMs?—Experiences with JIAJIA [J]. , 1999, 14(3): 193-205.
[13] Tang Weiyu; Shi Wu; Zang Binxu; Zhu Chuanqi;. Exploiting Loop Parallelism with Redundant Execution [J]. , 1997, 12(2): 105-112.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Li Wei;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[2] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[3] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[6] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[8] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[9] Shen Li; Stephen Y.H.Su;. Generalized Parallel Signature Analyzers with External Exclusive-OR Gates[J]. , 1986, 1(4): 49 -61 .
[10] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved