AI智能体的通用框架

李航

doi:10.1007/s11390-025-5951-5

摘要:

本文提出的智能体框架由以下部分组成：多模态大语言模型（Multimodal Large Language Model, MLLM）、工具模块（tools）、记忆模块（memory）、多模态编码器（multimodal encoder）、多模态解码器（multimodal decoder）和动作解码器（action decoder）。

研究背景 AI智能体是人工智能领域最前沿的方向之一，既包括运行于PC和移动设备的软件智能体，也包括在物理世界行动的硬件智能体（机器人）。当前，AI智能体的研究在基础模型、训练方法、系统设计与实际应用等方面均取得显著进展，然而关于智能体框架的研究仍相对有限。

目的本文作为一篇观点性论文，旨在总结现有工作并提出一种通用的AI智能体信息处理框架。

方法该框架的关键特征包括：智能体以任务为导向；以文本与多模态数据作为输入与输出；利用大语言模型进行推理；通过强化学习构建；并能够调用工具与记忆系统。

结果目前已发展的代表性智能体与智能体框架，以及字节跳动Seed团队近期研发的若干智能体，均符合这一通用框架。本文将以这些智能体为例进行说明。本文还探讨了所提出的通用智能体框架与人脑信息处理机制之间的关系，分析了智能体技术的主要特点，并提出了未来AI智能体研究的重要方向。

结论本文提出了AI智能体的通用信息处理框架，认为无论是软件智能体还是硬件智能体（机器人），其框架已逐渐演进至能够统一归纳的阶段。AI智能体运用先进的大模型技术，具备强大的智能信息处理能力，可类比于人脑的智能处理机制，并将成为未来人工智能的重要基石。

Abstract: AI agents represent one of the most prominent frontiers in artificial intelligence. They include software agents that operate on PCs and mobile devices, as well as hardware agents (robots) that function in the physical world. Currently, research and development on AI agents are making significant progress in foundation models, training methods, system design, and practical applications. However, research on agent frameworks remains limited. This article, as a perspective paper, summarizes existing work and proposes a general framework for AI agent information processing. The key features of this framework are that agents are task-oriented, take text and multimodal data as inputs and outputs, use large language models for reasoning, are constructed through reinforcement learning, and leverage tools and memory. Representative agents and agent frameworks developed to date, as well as several agents recently developed at ByteDance Seed, all conform to this general framework. This article introduces these agents as examples. In addition, the article discusses the relationship between the proposed general agent framework and the information processing mechanisms of the human brain, analyzes the main characteristics of agent technologies, and proposes important directions for future research on AI agents.

AI智能体的通用框架

General Framework of AI Agents