AllGuard:一种用于边端部署内容安全审核的多模态大语言模型

屈薇; 陆炜; 陈聪; 李涛; 陈浩东; 杜瑞麒; 凌心雨

doi:10.1007/s11390-025-5508-7

摘要:

研究背景 随着AIGC（人工智能生成内容）技术的迅猛发展以及数字经济的不断扩张，异构内容创作呈现出爆炸式增长，给内容安全审核带来了前所未有的复杂性与挑战。尽管大语言模型（LLM）内容审核任务中，尤其是在处理语义复杂场景方面展现出卓越的理解能力，并取得了显著进展，当前大多数框架仍局限于单一模态输入（如文本），难以有效识别多模态内容中隐蔽或微妙的风险因素。此外，LLM对算力资源的高度依赖，也显著限制了其在资源受限的边缘设备上的实际部署与应用。

目的本文旨在研究一种基于LLM架构的内容安全审核通用框架，不仅能对多模态数据（如文本、图像等）进行有效审核，同时能部署在资源受限的端侧设备，并在保障审核性能的前提下实现高效运行。

方法本文提出了 AllGuard，一个面向多模态内容安全审核的通用框架（如图1），旨在同时应对语义异构性与部署效率的双重挑战。AllGuard 采用结构化的处理管道，覆盖文本、图像、音频与视频四类主要数据模态，整合模块化编码器、提示词驱动的LLM推理机制，以及面向边缘部署的优化策略，以实现高效而全面的内容审核能力。为支持模型的稳健训练与评估，我们构建了一个高质量、人工标注的数据集，依据系统化的安全风险分类体系进行标注，涵盖暴力、裸露、隐私侵犯及其他伦理敏感类别。

结果大量实验结果表明，本文提出的AllGuard 在内容审核任务中显著优于同规模的其他基于LLM的方法，达到了91.58%的准确率。此外，在两种资源受限的端侧设备（NVIDIA Jetson Orin 和 Orange Pi 5 Plus）上进行的评估进一步验证了其鲁棒性：AllGuard 平均准确率为90.66%，精确率为91.92%，召回率为90.75%，F1 分数达到90.87%。

结论 AllGuard 是一套完整的内容安全审核通用框架，构建于大语言模型（LLM）架构之上，专为在端侧设备上的部署而设计。为支持该套件的设计，本文提出了一种新颖的安全风险分类体系，用于评估多模态数据（如图像、文本等）的安全性，并构建了一个由人工标注的安全数据集，以促进相关研究与评估工作。AllGuard 针对该新数据集进行了定向微调，并融合了多样化的安全策略。作者通过一个独立的测试集对 AllGuard 的性能进行了验证，结果表明，即使在资源受限的设备上部署，该框架仍能够取得令人满意的效果。本文的工作为实际应用中的内容安全审核提供了一种实用、稳健且具备部署可行性的解决方案，有望推动 AIGC 技术在更安全、负责任的方向上发展与应用。

Abstract: The rapid advancement of artificial intelligent generated content (AIGC) technologies and the growth of the digital economy have led to an explosion in heterogeneous content creation, posing increasingly complex challenges for content moderation (CM). While large language models (LLMs) have made significant progress in CM, most existing frameworks are limited to single-modality inputs (e.g., text), and are therefore inadequate for detecting nuanced or latent risks in multimodal data. Moreover, the computational demands of LLMs hinder their deployment on edge devices with limited resources. To address these challenges, we propose AllGuard—a lightweight, scalable multimodal CM framework optimized for edge deployment. AllGuard integrates a modular pipeline capable of handling text, image, audio, and video inputs through multimodal tokenization and prompt-based safety reasoning. We construct and annotate a high-quality multimodal dataset based on a comprehensive safety taxonomy, and fine-tune LLM using LoRA (Low-Rank Adaptation)-based adaptation for efficient model specialization. Empirical results show that AllGuard achieves state-of-the-art performance across multiple modalities, with an overall accuracy of 91.58% and strong generalization on edge hardware, maintaining over 90.66% accuracy on both the Jetson Orin and Orange Pi 5 Plus platforms. Our work provides a practical, robust, and deployable solution for real-world CM, promoting the safe and responsible use of AIGC applications.

AllGuard:一种用于边端部署内容安全审核的多模态大语言模型

AllGuard: A Multimodal Large Language Model for Edge-Deployed Content Security Assessment