基于强化语义学习网络的篇章级事件事实性识别

钱忠; 李培峰; 朱巧明; 周国栋

doi:10.1007/s11390-024-2655-1

基于强化语义学习网络的篇章级事件事实性识别

Document-Level Event Factuality Identification via Reinforced Semantic Learning Network

摘要

摘要:
研究背景 篇章级事件事实性识别要求从篇章角度判断事件的真实程度，即判断事件是一定发生，一定不发生，亦或是可能发生等。一个篇章中包含事件实例的句子所具有的事实性有可能各不相同，但篇章级事实性是唯一的。作为自然语言处理领域中信息可信度方向的重要基础性任务，篇章级事件事实性可以位文本挖掘、篇章理解和信息抽取等提供更可靠的语义信息，并推动相关理论和应用的发展。
目的现有事件事实性工作大多局限于句子级任务，无法从篇章全局层面判断事件事实性。而少数篇章级任务则依赖于标注的事件触发词以及不确定和否定词，不利于直接应用到真实世界的数据。其中，需要解决的问题主要包括，如何构建端到端模型，如何构建语义融合网络，以及如何进行文本选择并去除噪音等。为解决上述问题，本文专注于端到端的篇章级事件事实性识别任务，仅考虑事件及其篇章作为输入，这需要构建更有效的语义理解网络。这项任务更具有挑战性，同时更具现实意义。
方法本文设计了一种新的强化语义学习网络模型，具备多粒度多层次等特点。该模型在编码时融合了事件、句内和篇章主题等信息，利用策略网络抽取事件相关的句子和词语，并学习了句子级和篇章级两层语义，获得事件的表示信息，最终识别其事实性。
结果本文在新标注的ExDLEF语料上验证所提出的模型，并选择了RLSTM、BERT-base、RMHAN等作为基准系统。实验表明，本文模型优于基准系统，在中文和英文子语料上的宏平均/微平均分别达到75.74/78.29和67.42/77.48。消融实验证明了模型中各子模块的有效性，其中以事件和主题编码器最为重要，同时也证明了相比词语，句子选择机制对性能的影响更大。实验的局限性主要在于没有精确考查抽取句子的性能模型的深层可解释性。
结论针对篇章级事件事实性中的端到端建模、综合性语义编码、文本选择等问题，本文提出了一种强化语义学习神经网络模型，融合了多层编码、文本选择等机制。实验证明了该模型在端到端篇章级事件事实性识别方面的有效性，且可以应用于其它端到端的以目标为导向的文本分类和信息抽取任务。今后可以进一步研究多事件篇章级、证据性篇章级、跨篇章事件事实性等任务。

Abstract: This paper focuses on document-level event factuality identification (DEFI), which predicts the factual nature of an event from the view of a document. As the document-level sub-task of event factuality identification (EFI), DEFI is a challenging and fundamental task in natural language processing (NLP). Currently, most existing studies focus on sentence-level event factuality identification (SEFI). However, DEFI is still in the early stage and related studies are quite limited. Previous work is heavily dependent on various NLP tools and annotated information, e.g., dependency trees, event triggers, speculative and negative cues, and does not consider filtering irrelevant and noisy texts that can lead to wrong results. To address these issues, this paper proposes a reinforced multi-granularity hierarchical network model: Reinforced Semantic Learning Network (RSLN), which means it can learn semantics from sentences and tokens at various levels of granularity and hierarchy. Since integrated with hierarchical reinforcement learning (HRL), the RSLN model is able to select relevant and meaningful sentences and tokens. Then, RSLN encodes the event and document according to these selected texts. To evaluate our model, based on the DLEF (Document-Level Event Factuality) corpus, we annotate the ExDLEF corpus as the benchmark dataset. Experimental results show that the RSLN model outperforms several state-of-the-arts.

HTML全文

参考文献()

施引文献

资源附件()