Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

Lei Guan; Dong-Sheng Li; Ji-Ye Liang; Wen-Jian Wang; Ke-Shi Ge; Xi-Cheng Lu

doi:10.1007/s11390-024-3872-3

Guan L, Li DS, Liang JY et al. Advances of pipeline model parallelism for deep learning training: An overview. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY 39(3): 567−584 May 2024. DOI: 10.1007/s11390-024-3872-3.

Citation:

Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

Abstract

Abstract

Deep learning has become the cornerstone of artificial intelligence, playing an increasingly important role in human production and lifestyle. However, as the complexity of problem-solving increases, deep learning models become increasingly intricate, resulting in a proliferation of large language models with an astonishing number of parameters. Pipeline model parallelism (PMP) has emerged as one of the mainstream approaches to addressing the significant challenge of training “big models”. This paper presents a comprehensive review of PMP. It covers the basic concepts and main challenges of PMP. It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches, and discusses the main techniques to achieve load balance for both intra-node and inter-node training. Furthermore, the main techniques to optimize computation, storage, and communication are presented, with potential research directions being discussed.

FullText(HTML)

References (92)

Relative Articles

Supplements (4)

Cited By

Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

Abstract

Catalog

Export File

Citation

Format

Content