StableME: Stable Model Editing via Knowledge Augmentation for Large Language Models
-
Abstract
Efficient model editing of large language models is essential for updating outdated information and incorporating specialized knowledge. Existing methods, however, often rely on two implicit assumptions: that knowledge is localized in a small subset of parameters and that different facts are largely independent of one another and of the model's broader capabilities. These assumptions can cause unstable edits, especially at large scales. To address this, we introduce StableME, a model-editing method based on knowledge augmentation rather than knowledge localization. StableME employs two automatic augmentation strategies during instruction fine-tuning: Semantic Paraphrase Enhancement (SPE), which diversifies factual expressions to strengthen the learning of edited knowledge, and Contextual Description Enrichment (CDE), which expands surrounding descriptions to mitigate forgetting of related knowledge. Experiments on our benchmark show that StableME surpasses prior editing methods in edited-knowledge stability and multi-hop knowledge stability while preserving unrelated knowledge and overall model capabilities. Furthermore, StableME is effective in a small-scale fine-tuning experiment on a closed-source ChatGPT model.
-
-