基于反向对比学习的中文拼写纠错框架

林楠铠; 武洪艳; 符斯慧; 蒋盛益; 阳爱民

doi:10.1007/s11390-025-3956-8

基于反向对比学习的中文拼写纠错框架

A Chinese Spelling Check Method Based on Reverse Contrastive Learning

摘要

摘要:
研究背景 中文拼写检查是自然语言处理中的一个重要任务。现有研究主要关注于文本表示的优化和多源信息的利用，以提升模型的检测和纠正能力，但鲜少关注提升模型对易混字的区分能力。
目的为了提高中文拼写检查模型对音形相近字的区分能力，本文提出了一个基于反向对比学习的新框架RCL-CSC。该框架旨在通过构造容易混淆的负样本，提升模型检测和纠正拼写错误的能力。
方法本研究提出的框架RCL-CSC包括语言表示模块、拼写检查模块和反向对比学习模块。本文采用预训练模型对输入句子进行编码，利用拼写纠错模块检测和纠正拼写错误；同时，采用反向对比学习模块学习拼音相似性信息和视觉相似性信息，以构造容易混淆的负样本。通过训练模型区分音形相近的汉字，RCL-CSC增强了模型对拼写错误的敏感性和纠正能力。
结果实验结果表明，本文提出的框架RCL-CSC能够显著提高中文拼写纠错的性能。在多个中文拼写纠错数据集上的测试结果显示，与现有技术相比，本框架在拼写错误检测和纠正上的F1值均有明显提升。这一提升不仅体现在整体性能上，更在处理那些容易混淆的特定字符上表现得更为突出，证明了反向对比学习在提升模型对细微差别敏感性方面的有效性。
结论本文提出的反向对比学习策略有效提升了中文拼写检查模型的性能，尤其是在处理音形相近的易混淆字符上。未来的工作将尝试整合不同维度的易混字，以进一步增强框架RCL-CSC的效果。

Abstract: Chinese spelling check is a task to detect and correct spelling mistakes in Chinese texts. Existing research aims to enhance the text representation and exploit multi-source information to improve the detection and correction capabilities of models, with little attention to improving the ability to distinguish confusable words. Contrastive learning, aiming to minimize the distance in the representation space between similar sample pairs, has recently become a dominant technique in natural language processing. Inspired by contrastive learning, we present a novel method for Chinese spelling checking, RCL-CSC, which consists of three modules: language representation, spelling check, and reverse contrastive learning. Specifically, we propose a reverse contrastive learning method, which explicitly forces the model to minimize the agreement between similar examples, namely, the phonetically and visually confusable characters. Experimental results show that our method is model-agnostic, and thus can be combined with existing Chinese spelling check models to achieve state-of-the-art performance.

HTML全文

参考文献()

施引文献

资源附件()