We use cookies to improve your experience with our site.
Li-Xian Ma, Le-Ping Wang, En Shao, Rong-Yu Cao, Guang-Ming Tan. VastPipe: A High-Throughput Inference System using Adaptive Space-Division Multiplexing DNN Accelerators[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-024-3773-5
Citation: Li-Xian Ma, Le-Ping Wang, En Shao, Rong-Yu Cao, Guang-Ming Tan. VastPipe: A High-Throughput Inference System using Adaptive Space-Division Multiplexing DNN Accelerators[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-024-3773-5

VastPipe: A High-Throughput Inference System using Adaptive Space-Division Multiplexing DNN Accelerators

  • The escalating demand for batched deep learning inference requires concurrent deployment of multiple deep neural network (DNN) models on a shared accelerator, thereby enabling spatial multiplexing to enhance resource utilization. Spatial multiplexing for co-locating multiple model services on the same accelerator increases the complexity of scheduling within a cluster. The meticulous collaborative optimization of model co-location combinations and resource allocation in a cluster creates an extensive configuration space for scheduling. In this paper, we present VastPipe, a high-throughput inference system that schedules batch-oriented and heterogeneous requests on spatial multiplexing-enabled computing clusters. VastPipe determines optimal scheduling configurations by jointly optimizing model co-location and resource allocation using reinforcement learning to solve this combinatorial optimization problem. The experimental results demonstrate that on a large-scale cluster comprising 250 machine nodes with 1,000 neural processing units (NPUs), VastPipe achieves average performance improvements of 2.2\times, 1.3\times, and 1.2\times compared to the baseline systems. Furthermore, VastPipe is optimized and evaluated on mainstream GPUs. The results demonstrate that VastPipe achieves average throughput improvements of 2.7\times on the NVIDIA A100 GPU and 1.9\times on the AMD MI100 GPU.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return