We use cookies to improve your experience with our site.
Guan L, Li DS, Liang JY et al. Advances of pipeline model parallelism for deep learning training: An overview. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY 39(3): 567−584 May 2024. DOI: 10.1007/s11390-024-3872-3.
Citation: Guan L, Li DS, Liang JY et al. Advances of pipeline model parallelism for deep learning training: An overview. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY 39(3): 567−584 May 2024. DOI: 10.1007/s11390-024-3872-3.

Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

  • Deep learning has become the cornerstone of artificial intelligence, playing an increasingly important role in human production and lifestyle. However, as the complexity of problem-solving increases, deep learning models become increasingly intricate, resulting in a proliferation of large language models with an astonishing number of parameters. Pipeline model parallelism (PMP) has emerged as one of the mainstream approaches to addressing the significant challenge of training “big models”. This paper presents a comprehensive review of PMP. It covers the basic concepts and main challenges of PMP. It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches, and discusses the main techniques to achieve load balance for both intra-node and inter-node training. Furthermore, the main techniques to optimize computation, storage, and communication are presented, with potential research directions being discussed.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return