We use cookies to improve your experience with our site.
Khorassani KS, Chen CC, Ramesh B et al. High performance MPI over the Slingshot interconnect. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY 38(1): 128−145 Jan. 2023. DOI: 10.1007/s11390-023-2907-5.
Citation: Khorassani KS, Chen CC, Ramesh B et al. High performance MPI over the Slingshot interconnect. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY 38(1): 128−145 Jan. 2023. DOI: 10.1007/s11390-023-2907-5.

High Performance MPI over the Slingshot Interconnect

  • The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems. In particular, it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world, Frontier. It offers various features such as adaptive routing, congestion control, and isolated workloads. The deployment of newer interconnects sparks interest related to performance, scalability, and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems. In this paper, we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI (message passing interface) libraries. In particular, we look at the scalability performance when using Slingshot across nodes. We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH, OpenMPI + UCX, RCCL, and MVAPICH2 on CPUs and GPUs on the Spock system, an early access cluster deployed with Slingshot-10, AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system. We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return