\ttVastPipe : A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators

Li-Xian Ma; Le-Ping Wang; En Shao; Rong-Yu Cao; Guang-Ming Tan

doi:10.1007/s11390-024-3773-5

Ma LX, Wang LP, Shao E et al. ${\tt VastPipe} $: A high-throughput inference system via adaptive space-division multiplexing for diverse accelerators. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY, 40(2): 444−463, Mar. 2025. DOI: 10.1007/s11390-024-3773-5

Citation:

\ttVastPipe : A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators

Abstract

Abstract

The escalating demand on batched deep learning inference requires concurrent deployment of multiple deep neural network (DNN) models on a shared accelerator, thereby enabling spatial multiplexing to enhance resource utilization. Spatial multiplexing for co-locating multiple model services on the same accelerator increases the complexity of scheduling within a cluster. The meticulous collaborative optimization of model co-location combinations and resource allocation in a cluster creates an extensive configuration space for scheduling. In this paper, we present \tt VastPipe, a high-throughput inference system that schedules batch-oriented and heterogeneous requests on spatial multiplexing-enabled computing clusters. \ttVastPipe determines optimal scheduling configurations by jointly optimizing model co-location and resource allocation using reinforcement learning to solve this combinatorial optimization problem. The experimental results demonstrate that on a large-scale cluster comprising 250 machine nodes with 1000 neural processing units (NPUs), \ttVastPipe achieves average performance improvements of 2.2x, 1.3x, and 1.2x compared with the baseline systems, respectively. Furthermore, \ttVastPipe is optimized and evaluated on mainstream GPUs. The results demonstrate that \ttVastPipe achieves average throughput improvements of 2.7x on the NVIDIA A100 GPU and 1.9x on the AMD MI100 GPU.

FullText(HTML)

References (53)

Relative Articles

Supplements (3)

Cited By

\ttVastPipe : A High-Throughput Inference System via Adaptive Space-Division Multiplexing for Diverse Accelerators

Abstract

Catalog

Export File

Citation

Format

Content