Duba: Cost-Efficient Serverless Cloud-Edge Collaborative ML Serving with Dual-Batching
-
Abstract
The integration of edge and serverless cloud computing, which combines the low-latency advantages of edge processing with the cost efficiency and scalability of serverless cloud architectures, provides an ideal foundation for serving machine learning (ML) applications. While batching has demonstrated significant improvements in resource utilization through parallel execution, current approaches that independently optimize batching for edge or serverless cloud environments overlook their synergistic potential, leading to suboptimal end-to-end performance. To bridge this gap, we present Duba, a serverless cloud-edge collaborative system designed for cost-efficient ML serving. At its core, Duba introduces a novel dual-batching mechanism that harmonizes batching strategies across edge and serverless cloud environments. To realize this design, Duba combines lightweight configuration optimization with an adaptive scheduling policy, delivering substantial improvements in both cost efficiency and performance. Experimental results demonstrate that Duba consistently outperforms existing systems, reducing the serving costs by up to 74.1% and improving SLO compliance by over 6.9%.
-
-