Efficient Vision Transformer Inference via UDP for Edge-Cloud Collaboration: An Adaptive Loss Detection Approach
-
Abstract
Vision Transformers (ViTs) deliver exceptional performance in computer vision tasks but pose significant computational challenges for edge devices. We present an efficient vision transformer inference framework (EViTIF), an edge-cloud collaborative framework that utilizes user datagram protocol (UDP) to achieve low-latency communication by strategically partitioning ViT models between edge and cloud environments. To mitigate UDP’s inherent unreliability, we introduce the packet error rate adaptive loss detection network (PALDN), which dynamically recovers lost data without requiring extensive model retraining. Our experiments, conducted on an Nvidia Jetson Xavier NX edge device and an A100 GPU-equipped cloud server, demonstrate that EViTIF reduces inference latency by up to 57× compared to traditional TCP-based methods. Even with up to 60% packet loss, PALDN maintains accuracy degradation below 2%, outperforming existing super-resolution-based recovery approaches. Moreover, EViTIF demonstrates its versatility by generalizing across different ViT variants and scaling effectively to larger datasets like ImageNet. This framework enables real-time, high-performance vision applications in edge computing by balancing computational efficiency with robustness against network imperfections.
-
-