面向网络安全的DPU：实现内联防御与自防护

李晓维; 廖云坤; 鄢贵海

doi:10.1007/s11390-026-6034-y

摘要:

文章摘要图/表：

Figure 1 DPU安全：内联防御与自防护Figure 2 DPU驱动的主机侧RCSCA检测器架构Figure 3 SNO：面向异构FPGA SmartNIC的TEE

研究背景 随着云计算网络带宽向100Gbps及更高迈进，以及AI驱动的新型攻击手段的出现，传统的基于CPU的安全架构面临严重挑战。例如，常用的入侵检测系统在处理高速流量时需要消耗数百个CPU核心，造成极大的性能瓶颈。此外，针对RDMA网卡的微架构侧信道攻击（如RDMA NIC Cache Side-Channel Attack，RCSCA）日益严重，现有的基于交换机的防御方案（如Bedrock）存在检测延迟高、扩展性差的问题。同时，DPU作为新兴的计算支柱，其自身的安全性（特别是FPGA部分的机密计算支持）尚未得到充分解决，难以在不受信的云环境中保护租户的硬件逻辑。

目的本研究旨在探索DPU在现代网络安全中的双重核心价值：不仅将其作为高性能的内联防御（Inline Defense）平台以解决CPU安全瓶颈，同时构建DPU自身的自防护（Self-Protection）机制以确保其作为信任根的可靠性。本文致力于通过案例验证DPU安全的双重视角，既能利用DPU加速网络攻击检测，又能通过可信执行环境（TEE）保护DPU上的租户应用。

方法本文综合并扩展了作者之前的两项工作，提出了基于DPU的双视角安全架构：1)针对内联防御：设计了首个DPU驱动的主机侧RCSCA检测器。该架构遵循“智能边缘”原则，利用DPU的FPGA资源实现硬件加速的流量采集、旁路分析（基于FPGA加速的SCADET算法）和攻击阻断，在不影响正常RDMA通信性能的前提下检测潜在的RCSCA。针对自防护：提出了SNO，这是首个面向异构FPGA SmartNIC的完整TEE框架。SNO设计了硬件信任根和安全启动机制，利用SoC侧的TEE运行SNO Manager进行远程证明，并设计了SNO Guard硬件模块，对包括网络数据包、DMA和内存访问在内的所有I/O路径实施流式AES-GCM认证加密。

结果仿真结果表明，PPIMCE 在执行混淆电路（GC）时比 CPU 提速 107 倍，在执行基于 CKKS 的同态加密（HE）乘法时分别比 CPU 和 GPU 提速 1 500 倍和 800 倍。在隐私保护机器学习推理任务中，PPIMCE 相比 CPU 提升 1 000 倍性能，并较最新加密计算加速器 CraterLake 提升 12 倍性能。

结论本文证实了DPU是打破高性能网络安全瓶颈的关键路径。研究表明，将攻击检测下沉到网络边缘（DPU）的分布式架构，在处理高频攻击时，比中心化（交换机）架构具有显著的性能和扩展性优势。同时，SNO框架证明了在异构SmartNIC上构建轻量级、全覆盖的机密计算环境是可行的。该工作为构建从主机到DPU、再到未来原生安全传输协议（如Ultra Ethernet）的端到端信任链提供了重要的理论与实践基础。

Abstract: As conventional CPU-based security architectures struggle to scale with ever-growing network bandwidths and increasingly sophisticated cyberattacks, the data processing unit (DPU), a specialized processor for datacenter infrastructure, has emerged as a transformative foundation for secure and high-performance computing. Unlike prior fragmented studies, this work proposes a comprehensive security framework for DPUs by systematically investigating the DPUs' dual role in cybersecurity, serving both as an active security enforcer and as a critical component that must itself be protected. First, the framework offloads security policies onto the DPU to enable line-rate packet inspection and hardware-accelerated security processing. Second, the framework re-architects the DPU itself to defend against physical and architectural attacks, acknowledging that the DPU also introduces a new attack surface. We validate these two design directions through two representative case studies, demonstrating the effectiveness and practicality of the proposed DPU security framework. Experimental results show that the proposed framework reduces remote direct memory access (RDMA) cache side-channel detection latency by up to 98.7% compared with the state-of-the-art, while enabling a trusted execution environment on field-programmable gate array (FPGA)-based DPUs with sub-100 ns overhead and less than 4% FPGA resource consumption.

面向网络安全的DPU：实现内联防御与自防护

DPU for Cybersecurity: Enabling Inline Defense and Self-Protection