|
计算机科学技术学报 ›› 2023,Vol. 38 ›› Issue (1): 128-145.doi: 10.1007/s11390-023-2907-5
所属专题: Computer Architecture and Systems
Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Member, ACM, IEEE, and Dhabaleswar K. Panda, Fellow, IEEE, Member, ACM
随着 Slingshot 技术在即将到来的百亿亿次系统中的应用,这项由HPE/Cray设计的互连技术正在高性能计算领域中变得越来越重要。值得一提的是,这项互连技术支撑起了世界上第一台排名最高的百亿亿次超级计算机:Frontier。它提供了诸如自适应路由选择、拥塞控制、隔离工作负载等功能。新型互连技术的运用引发了研究者对其性能、可扩展性和任何潜在的瓶颈的关注和研究兴趣,因为它们是在这些系统中的节点间进行扩展的关键因素。本文深入探讨了由Slingshot互连技术和目前最先进的MPI(消息传递接口)库所带来的挑战,尤其是研究了在跨节点间使用Slingshot的可扩展性。我们在Spock系统中进行了一项综合性能评估:在CPU和GPU上,使用不同的MPT和通信库(包括Cray MPICH, Open-MPI + UCX, RCCL, and MVAPICH2)仿真Frontier系统。Spock系统是一个部署了Slingshot-10, AMD MI100 GPU和AMD Epyc Rome CPU的聚簇。我们还初步评估了CPU环境下MPI库对Slingshot-11互连的支持。
<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Khorassani K S, Chen C C, Ramesh B, Shafi A, Subramoni H, Panda D. High performance MPI over the Slingshot interconnect: Early experiences. In <i>Proc. the 2022 Practice and Experience in Advanced Research Computing</i>, Jul. 2022. DOI: <a href="http://dx.doi.org/10.1145/3491418.3530773">10.1145/3491418.3530773</a>. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kim J, Dally W J, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In <i>Proc. the 2008 International Symposium on Computer Architecture</i>, Jun. 2008, pp.77–88. DOI: <a href="http://dx.doi.org/10.1109/ISCA.2008.19">10.1109/ISCA.2008.19</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Gabriel E, Fagg G E, Bosilca G, Angskun T, Dongarra J J, Squyres J M, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R H, Daniel D J, Graham R L, Woodall T S. Open MPI: Goals, concept, and design of a next generation MPI implementation. In <i>Proc. the 11th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface</i>, Sept. 2004, pp.97–104. DOI: <a href="http://dx.doi.org/10.1007/978-3-540-30218-6_19">10.1007/978-3-540-30218-6_19</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Thakur R, Rabenseifner R, Gropp W. Optimization of collective communication operations in MPICH. <i>International Journal of High Performance Computing Applications</i>, 2005, 19(1): 49–66. DOI: <a href="http://dx.doi.org/10.1177/1094342005051521">10.1177/1094342005051521</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Panda D K, Subramoni H, Chu C H, Bayatpour M. The MVAPICH project: Transforming research into high-performance MPI library for HPC community. <i>Journal of Computational Science</i>, 2021, 52: 101208. DOI: <a href="http://dx.doi.org/10.1016/j.jocs.2020.101208">10.1016/j.jocs.2020.101208</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Bureddy D, Wang H, Venkatesh A, Potluri S, Panda D K. OMB-GPU: A micro-benchmark suite for evaluating MPI libraries on GPU clusters. In <i>Proc. the 19th European Conference on Recent Advances in the Message Passing Interface</i>, Sept. 2012, pp.110–120. DOI: <a href="http://dx.doi.org/10.1007/978-3-642-33518-1_16">10.1007/978-3-642-33518-1_16</a>. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Chakraborty S, Bayatpour M, Hashmi J, Subramoni H, Panda D K. Cooperative rendezvous protocols for improved performance and overlap. In <i>Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis</i>, Nov. 2018, pp.361–373. DOI: <a href="http://dx.doi.org/10.1109/SC.2018.00031">10.1109/SC.2018.00031</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Khorassani K S, Hashmi J, Chu C H, Chen C C, Subramoni H, Panda D K. Designing a ROCm-aware MPI library for AMD GPUs: Early experiences. In <i>Proc. the 36th International Conference on High Performance Computing</i>, Jun. 24–Jul. 2, 2021, pp.118–136. DOI: <a href="http://dx.doi.org/10.1007/978-3-030-78713-4_7">10.1007/978-3-030-78713-4_7</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> De Sensi D, Di Girolamo S, McMahon K H, Roweth D, Hoefler T. An in-depth analysis of the Slingshot interconnect. In <i>Proc. the 2020 International Conference for High Performance Computing, Networking, Storage and Analysis</i>, Nov. 2020. DOI: <a href="http://dx.doi.org/10.1109/SC41405.2020.00039">10.1109/SC41405.2020.00039</a>. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Melesse Vergara V G, Budiardja R D, Joubert W. Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs. U.S. Department of Energy, 2021. <a href="https://cug.org/proceedings/cug2021_proceedings/includes/files/pap108s2-file2.pdf">https://cug.org/proceedings/cug2021_proceedings/includes/files/pap108s2-file2.pdf</a>, Jan. 2023. </div> </td> </tr> </tbody> </table> |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |