We use cookies to improve your experience with our site.
Wu RH, Zhu XY, Chen JS et al. SwFormer: Enabling faster foundation models on new Sunway Supercomputer via holistic kernel tiling and scheduling. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY, 40(6): 1512−1529, Nov. 2025. DOI: 10.1007/s11390-025-4761-0
Citation: Wu RH, Zhu XY, Chen JS et al. SwFormer: Enabling faster foundation models on new Sunway Supercomputer via holistic kernel tiling and scheduling. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY, 40(6): 1512−1529, Nov. 2025. DOI: 10.1007/s11390-025-4761-0

SwFormer: Enabling Faster Foundation Models on New Sunway Supercomputer via Holistic Kernel Tiling and Scheduling

  • Deep learning's continuous evolution has driven the creation of increasingly large foundation models, such as GPT-3, which requires optimized performance on large-scale computing platforms. The new Sunway Supercomputer, equipped with numerous SW26010pro processors, supports AI workloads in both all-shared and single-CG (core group) modes. However, existing optimizations primarily target AI operators like Generalized Matrix Multiplication (GEMM) in the single-CG mode, leaving challenges in scaling performance across all six CGs in the all-shared mode. This paper introduces SwFormer, a framework designed to accelerate foundation models via intra-op tiling and inter-op scheduling. The intra-op tiling method breaks down operators into fine-grained tiled kernels and employs an offline profiling-based approach to determine the optimal tiling strategy. The inter-op scheduling method employs heuristic graph traversal algorithms to automatically reorder the computation of these tiled kernels, thereby maximizing hardware utilization. Compared with operator libraries for the all-shared mode such as SWDNNv2 and SWattention, SwFormer's intra-op tiling method accelerates end-to-end GPT-3 6.7B and 13B models training by up to 1.27x. Evaluated with GPT-style models, the inter-op scheduling method further outperforms the intra-op tiling method by up to 1.32x.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return