Journal of Computer Science and Technology ›› 2023, Vol. 38 ›› Issue (1): 128-145.doi: 10.1007/s11390-023-2907-5

Special Issue: Computer Architecture and Systems

• Special Issue in Honor of Professor Kai Hwang’s 80th Birthday • Previous Articles     Next Articles

High Performance MPI over the Slingshot Interconnect

Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Member, ACM, IEEE, and Dhabaleswar K. Panda, Fellow, IEEE, Member, ACM        

  1. Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, U.S.A.
  • Received:2022-10-16 Revised:2022-10-29 Accepted:2023-01-05 Online:2023-02-28 Published:2023-02-28
  • Contact: Kawthar Shafie Khorassani E-mail:shafiekhorassani.1@osu.edu
  • About author:Kawthar Shafie Khorassani is a Ph.D. student in the Department of Computer Science and Engineering at The Ohio State University, Columbus. She got her Bachelor’s degree in mathematics and computer science at Wayne State University in Detroit, MI. She currently works in the Network Based Computing Laboratory on the MVAPICH2-GDR project. Her research interests lie in high-performance computing (HPC), and in GPU communication and computation.
  • Supported by:
    This work is supported in part by the National Science Foundation of USA under Grant Nos. 1818253, 1854828, 1931537, 200-7991, and XRAC under Grant No. NCR-130002. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems. In particular, it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world, Frontier. It offers various features such as adaptive routing, congestion control, and isolated workloads. The deployment of newer interconnects sparks interest related to performance, scalability, and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems. In this paper, we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI (message passing interface) libraries. In particular, we look at the scalability performance when using Slingshot across nodes. We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH, OpenMPI + UCX, RCCL, and MVAPICH2 on CPUs and GPUs on the Spock system, an early access cluster deployed with Slingshot-10, AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system. We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.

Key words: AMD GPU; interconnect technology; MPI (message passing interface); Slingshot;

<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Khorassani K S, Chen C C, Ramesh B, Shafi A, Subramoni H, Panda D. High performance MPI over the Slingshot interconnect: Early experiences. In <i>Proc. the 2022 Practice and Experience in Advanced Research Computing</i>, Jul. 2022. DOI: <a href="http://dx.doi.org/10.1145/3491418.3530773">10.1145/3491418.3530773</a>. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kim J, Dally W J, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In <i>Proc. the 2008 International Symposium on Computer Architecture</i>, Jun. 2008, pp.77–88. DOI: <a href="http://dx.doi.org/10.1109/ISCA.2008.19">10.1109/ISCA.2008.19</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Gabriel E, Fagg G E, Bosilca G, Angskun T, Dongarra J J, Squyres J M, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R H, Daniel D J, Graham R L, Woodall T S. Open MPI: Goals, concept, and design of a next generation MPI implementation. In <i>Proc. the 11th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface</i>, Sept. 2004, pp.97–104. DOI: <a href="http://dx.doi.org/10.1007/978-3-540-30218-6_19">10.1007/978-3-540-30218-6_19</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Thakur R, Rabenseifner R, Gropp W. Optimization of collective communication operations in MPICH. <i>International Journal of High Performance Computing Applications</i>, 2005, 19(1): 49–66. DOI: <a href="http://dx.doi.org/10.1177/1094342005051521">10.1177/1094342005051521</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Panda D K, Subramoni H, Chu C H, Bayatpour M. The MVAPICH project: Transforming research into high-performance MPI library for HPC community. <i>Journal of Computational Science</i>, 2021, 52: 101208. DOI: <a href="http://dx.doi.org/10.1016/j.jocs.2020.101208">10.1016/j.jocs.2020.101208</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Bureddy D, Wang H, Venkatesh A, Potluri S, Panda D K. OMB-GPU: A micro-benchmark suite for evaluating MPI libraries on GPU clusters. In <i>Proc. the 19th European Conference on Recent Advances in the Message Passing Interface</i>, Sept. 2012, pp.110–120. DOI: <a href="http://dx.doi.org/10.1007/978-3-642-33518-1_16">10.1007/978-3-642-33518-1_16</a>. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Chakraborty S, Bayatpour M, Hashmi J, Subramoni H, Panda D K. Cooperative rendezvous protocols for improved performance and overlap. In <i>Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis</i>, Nov. 2018, pp.361–373. DOI: <a href="http://dx.doi.org/10.1109/SC.2018.00031">10.1109/SC.2018.00031</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Khorassani K S, Hashmi J, Chu C H, Chen C C, Subramoni H, Panda D K. Designing a ROCm-aware MPI library for AMD GPUs: Early experiences. In <i>Proc. the 36th International Conference on High Performance Computing</i>, Jun. 24–Jul. 2, 2021, pp.118–136. DOI: <a href="http://dx.doi.org/10.1007/978-3-030-78713-4_7">10.1007/978-3-030-78713-4_7</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> De Sensi D, Di Girolamo S, McMahon K H, Roweth D, Hoefler T. An in-depth analysis of the Slingshot interconnect. In <i>Proc. the 2020 International Conference for High Performance Computing, Networking, Storage and Analysis</i>, Nov. 2020. DOI: <a href="http://dx.doi.org/10.1109/SC41405.2020.00039">10.1109/SC41405.2020.00039</a>. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Melesse Vergara V G, Budiardja R D, Joubert W. Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs. U.S. Department of Energy, 2021. <a href="https://cug.org/proceedings/cug2021_proceedings/includes/files/pap108s2-file2.pdf">https://cug.org/proceedings/cug2021_proceedings/includes/files/pap108s2-file2.pdf</a>, Jan. 2023. </div> </td> </tr> </tbody> </table>
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Li Wei;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[6] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[7] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[8] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[9] Pan Qijing;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[10] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved