Processing math: 100%
We use cookies to improve your experience with our site.

Mix-Lingual Relation Extraction: Dataset and a Training Approach

Ling-Xing Kong, You-Gang Chu, Zheng Ma, Jian-Bing Zhang, Jia-Jun Chen

downloadPDF
孔令兴, 褚有刚, 马征, 张建兵, 陈家骏. 混合语言关系抽取:数据集和训练方法[J]. 计算机科学技术学报, 2025, 40(1): 42-59. DOI: 10.1007/s11390-024-4314-y
引用本文: 孔令兴, 褚有刚, 马征, 张建兵, 陈家骏. 混合语言关系抽取:数据集和训练方法[J]. 计算机科学技术学报, 2025, 40(1): 42-59. DOI: 10.1007/s11390-024-4314-y
Kong LX, Chu YG, Ma Z et al. Mix-lingual relation extraction: Dataset and a training approach. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 42−59, Jan. 2025. DOI: 10.1007/s11390-024-4314-y
Citation: Kong LX, Chu YG, Ma Z et al. Mix-lingual relation extraction: Dataset and a training approach. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 42−59, Jan. 2025. DOI: 10.1007/s11390-024-4314-y
孔令兴, 褚有刚, 马征, 张建兵, 陈家骏. 混合语言关系抽取:数据集和训练方法[J]. 计算机科学技术学报, 2025, 40(1): 42-59. CSTR: 32374.14.s11390-024-4314-y
引用本文: 孔令兴, 褚有刚, 马征, 张建兵, 陈家骏. 混合语言关系抽取:数据集和训练方法[J]. 计算机科学技术学报, 2025, 40(1): 42-59. CSTR: 32374.14.s11390-024-4314-y
Kong LX, Chu YG, Ma Z et al. Mix-lingual relation extraction: Dataset and a training approach. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 42−59, Jan. 2025. CSTR: 32374.14.s11390-024-4314-y
Citation: Kong LX, Chu YG, Ma Z et al. Mix-lingual relation extraction: Dataset and a training approach. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 42−59, Jan. 2025. CSTR: 32374.14.s11390-024-4314-y

混合语言关系抽取:数据集和训练方法

Mix-Lingual Relation Extraction: Dataset and a Training Approach

Funds: A preliminary version of the paper was published in the Proceedings of LREC-COLING 2024.
More Information
    Author Bio:

    Ling-Xing Kong received his B.S. degree in communication engineering from Beijing University of Chemical Technology, Beijing, in 2013. He is currently a Ph.D. candidate in the State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing. His research interests include relation extraction, multimodal learning, and large language models

    You-Gang Chu received his B.S. and M.S. degrees in computer science from Nanjing University, Nanjing, in 2021 and 2024, respectively. Currently, he is a software developer at Alibaba Group, Hangzhou, where he is studying the observability of software

    Zheng Ma is currently a Ph.D. candidate in the State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing. His research interests include image captioning, visual-language pre-training models, and large visual language models

    Jian-Bing Zhang received his Ph.D. degree in computer science from Nanjing University, Nanjing, in 2018. He is currently an associate professor and Ph.D. supervisor at the School of Artificial Intelligence, Nanjing University. His research interests are Artificial Intelligence (AI) for science and multimodal learning

    Jia-Jun Chen received his Ph.D. degree in computer science from Nanjing University, Nanjing, in 1998. He is a professor of the State Key Laboratory for Novel Software Technology, at Nanjing University. He is currently the director of the Natural Language Processing Lab at Nanjing University. His research interests include machine translation, text categorization, and information extraction

    Corresponding author:

    Jia-Jun Chen: chenjj@nju.edu.cn

  • 摘要:
    研究背景 

    关系抽取是自然语言处理领域的一个关键任务,具有广泛的实际应用。目前的关系抽取研究主要关注单语场景或者跨语言场景。然而对于日常生活中常见的混合语言(或语码切换)场景,现有研究关注度相对较低。在混合语言场景中,人们在句子中混合使用多种语言来传达信息或表达思想,从而产生混合语言内容。随着全球化进程的加速,越来越多的人开始掌握多种语言,因此混合语言场景研究的重要性日益凸显。在实际应用中,研究混合语言场景下的关系抽取任务,将有助于推动下游应用的发展,如构建更精细的知识图谱或推荐系统。然而,由于缺乏专门的数据集和针对此场景的相关研究,现有的关系抽取模型在混合语言场景中的有效性尚未得到验证。

    目的 

    本文提出了混合语言关系抽取任务,并试图针对以下问题进行探索:现有的关系抽取模型在混合语言环境中是否有效?如何将大语言模型的能力适配到混合语言场景下的关系抽取任务中?

    方法 

    我们构建了首个人工标注的混合语言关系抽取数据集MixRED(Mix-lingual Relation Extraction Dataset)。在构建过程中,为保证数据的多样性,我们采用了一种系统化的构建框架,考虑了在不同程度和层次上对各种语言文档进行融合的情况。我们提出了一种多层级的训练方法,目的是将大语言模型适配到混合语言关系抽取任务中。在该方法中,我们充分考虑了不同语言组合的相似性,并采用分层级训练的策略,以在不同层次上加深大语言模型对语言相似性的理解,并逐步提升其捕捉关系依赖的能力。此外,我们还探索了使用混合语言内容作为迁移学习工具的有效性,以实现在不同语言之间迁移模型所学习到的关系依赖。最后,我们深入探讨了影响监督模型和大语言模型在混合语言关系抽取任务中性能的各种因素。

    结果 

    我们在MixRED的混合语言和单语子集上评估了一系列监督模型与大语言模型。结果显示,现有模型在这些子集上的性能差异较大。值得注意的是,我们使用所提出的MixTrain方法重新训练的ChatGLM2版本(ChatGLM2-MixTrain),其F1分数比原始的ChatGLM2高出10.7-14.5个百分点。此外,ChatGLM2-MixTrain在混合语言子集英语-汉语、英语-德语、英语-日语和德语-日语上分别获得了16.8、18.1、15.8和14.5的F1分数,这是所有评估的大语言模型中最高的F1分数。我们还通过实验,测试了使用混合语言内容作为工具,将模型学习到的关系依赖在不同语言之间进行迁移的效果。结果表明,在大多数所测试的场景中,模型在使用混合语言内容进行迁移学习后性能得到了提升。

    结论 

    本文提出了混合语言关系抽取任务,并为此任务构建了人工标注的数据集MixRED。为了使大语言模型适应这个新任务,我们提出了一种名为MixTrain的多层级训练方法。我们的实验揭示了现有模型在MixRED上评估时的表现差异,表明它们对混合语言环境的适应能力各不相同。值得注意的是,我们使用MixTrain重新训练的ChatGLM2版本在MixRED上表现出显著的性能提升。这一结果验证了我们逐步在不同层次上加深大语言模型理解语言相似性和关系依赖的策略的有效性。此外,我们认识到混合语言内容有助于将模型学习到的关系依赖在不同语言之间进行迁移。通过研究混合层级、语言浓度和上下文学习策略等因素的影响,我们对模型在混合语言场景中的行为有了更细致的理解。在未来的工作中,我们计划考虑更广泛的语言组合,并在混合语言环境下探索更多的自然语言处理任务。

    Abstract:

    Relation extraction is a pivotal task within the field of natural language processing, boasting numerous real-world applications. Existing research predominantly centers on monolingual relation extraction or cross-lingual enhancement for relation extraction. However, there exists a notable gap in understanding relation extraction within mix-lingual (or code-switching) scenarios. In these scenarios, individuals blend content from different languages within sentences, generating mix-lingual content. The effectiveness of existing relation extraction models in such scenarios remains largely unexplored due to the absence of dedicated datasets. To address this gap, we introduce the Mix-Lingual Relation Extraction (MixRE) task and construct a human-annotated dataset MixRED to support this task. Additionally, we propose a hierarchical training approach for the mix-lingual scenario named Mix-Lingual Training (MixTrain), designed to enhance the performance of large language models (LLMs) when capturing relational dependencies from mix-lingual content spanning different semantic levels. Our experiments involve evaluating state-of-the-art supervised models and LLMs on the constructed dataset, with results indicating that MixTrain notably improves model performance. Moreover, we investigate the effectiveness of using mix-lingual content as a tool to transfer learned relational dependencies across different languages. Additionally, we delve into factors influencing model performance for both supervised models and LLMs in the novel MixRE task.

  • Figure  1.   Real-world mix-lingual RE instance in different language versions. Terms in the same color represent mentions of a specific entity. (a) Primarily in Chinese. (b) Primarily in English. (c) Primarily in Japanese. (d) Primarily in German.

    Figure  2.   Monolingual data extension.

    Figure  3.   Construction process of MixRED. The proportions of 30%, 50%, and 70% represent different concentrations of content converted from Language 1 to Language 2 when constructing mix-lingual samples. (a) Overview of the MixRED construction process. (b) Hierarchical mix module.

    Figure  4.   Distribution of samples in the EN-ZH subset of MixRED.

    Figure  5.   Distribution of relational triples for the top eight relation types in MixRED. Per.: person, Obj.: object.

    Figure  6.   Proposed MixTrain approach. Sub.: subject entity, Obj.: object entity, Sent.: sentence, Rel.: relation. (a) Our instruction template employed during the training process. (b) Overview of the MixTrain approach.

    Figure  7.   Prompts employed for the MixTrain approach.

    Figure  8.   Sample from our MixRED dataset. Entities within a specific relational triple are denoted by the same color, with the corresponding relation highlighted in purple.

    Figure  9.   Development of mix-lingual exemplars and CoT for enhancing the performance of LLMs[14].

    Figure  10.   Comparative evaluation of the performance of LLMs with diverse exemplar and CoT combinations on EN-ZH. “Mono” represents monolingual, “Mix” signifies mix-lingual, and “EXPL” stands for exemplar[14].

    Table  1   Comparison of MixRED with Existing RE Datasets

    Dataset #Doc #Sent. #Word (×103) #Ent. #Mention #Rel. Avg. Mention
    SemEval2010 Task8[29] 10717 205 21434 21434 9 1.0
    ACE 2003[30]~ACE 2004[30] 12783 297 46108 46108 24 1.0
    TACRED[1] 53791 1823 152527 152527 41 1.0
    FewRel[16] 56109 1397 72124 72124 100 1.0
    DocRED[4] 5053 40276 1002 98560 128128 96 1.3
    MixRED (ours) 30300 880384 13683 163116 337250 21 2.1
    Note: #Doc: the number of documents, #Sent.: the number of sentences, #Word: the number of words, #Ent.: the number of entities, #Mention: the number of mentions, #Rel.: the number of relations, Avg. Mention: the average number of mentions per entity.
    下载: 导出CSV

    Table  2   Comparison of the F1 Scores Achieved by Various Supervised Models on DocRED, the Monolingual Subsets of MixRED, and the Mix-Lingual Subsets of MixRED

    Supervised DocRED Monolingual Subset of MixRED Mix-Lingual Subset of MixRED
    Model MixRED English MixRED Chinese MixRED German MixRED Japanese MixRED EN-ZH MixRED EN-DE MixRED ZH-DE MixRED ZH-JP MixRED EN-JP MixRED DE-JP
    LSR[32] 59.0 32.9 31.6 30.2 27.6 31.0 29.4 27.5 27.5 29.8 28.8
    BERT-E[33] 56.3 35.6 36.9 32.0 34.8 32.4 31.7 28.0 32.0 27.4 29.3
    ATLOP[33] 61.5 35.4 39.5 33.0 35.3 31.3 32.6 36.0 37.1 33.0 30.8
    XLM-R[34] 34.3 31.2 31.7 0.1 36.2 34.2 23.9 22.0 28.6 29.2
    BERT-E-mix (ours) 37.6 34.0 33.9 17.3 37.1 35.4 28.4 26.0 33.8 21.8
    ATLOP-mix (ours) 38.0 37.5 34.0 30.9 37.4 37.5 37.4 34.2 37.3 34.6
    下载: 导出CSV

    Table  3   Comparison of the F1 Scores Achieved by Various LLMs on Both the Monolingual Subsets and Mix-Lingual Subsets of MixRED

    Large Language Monolingual Subset of MixRED Mix-Lingual Subset of MixRED
    Model MixRED English MixRED Chinese MixRED German MixRED Japanese MixRED EN-ZH MixRED EN-DE MixRED ZH-DE MixRED ZH-JP MixRED EN-JP MixRED DE-JP
    GPT-3.5[35] 11.4 17.8 8.5 15.1 12.2 11.6 18.6 18.7 13.5 8.7
    Qwen[36] 4.0 5.7 3.2 5.2 4.2 3.5 5.3 5.2 4.8 3.3
    LLaMA2-7B[39] 7.1 2.5 2.8 0.0 4.9 9.9 6.3 1.2 2.7 0.0
    LLaMA2-13B[39] 7.8 9.0 5.1 0.0 8.2 11.7 9.6 2.0 5.2 0.0
    ChatGLM2[15] 7.3 5.9 2.2 0.0 5.7 7.4 5.1 1.1 2.2 0.0
    ChatGLM2-LoRA (ours) 6.1 13.2 8.6 8.6 9.7 7.0
    ChatGLM2-MixTrain (ours) 16.8 18.1 17.2 14.9 15.8 14.5
    下载: 导出CSV

    Table  4   Ablation Study (F1 Score) of the Proposed MixTrain Approach

    TaskEN-ZHEN-DEZH-DEZH-JPEN-JPDE-JP
    Word-level task1.71.20.41.70.41.0
    Phrasal-level task7.05.65.94.35.26.1
    Sentence-level task3.12.41.21.91.72.9
    下载: 导出CSV

    Table  5   XLM-R Performance on Monolingual Subsets After Transferring Knowledge Learned from Mix-Lingual Subsets

    EN-* ZH-* DE-* JP-*
    EN 34.3 34.9 34.6 33.6
    ZH 39.8 31.2 33.4 38.7
    DE 36.8 27.1 31.7 33.6
    JP 30.9 27.2 29.6 0.1
    Note: The first row represents the mix-lingual subsets, where ``*'' denotes one of the languages from the first column. The rest rows represent the results obtained by testing on different monolingual subsets. The diagonal results (red color) indicate XLM-R's original performance on monolingual subsets without employing transfer learning. Please note that the order of languages in a mix-lingual subset does not affect its representation. Therefore, EN-ZH and ZH-EN denote the same subset.
    下载: 导出CSV

    Table  6   Model Performance Across Different Mix Levels on EN-ZH[14]

    Model Inter-Sentence Intra-Sentence Entity
    LSR[32] 33.3 33.8 27.5
    BERT-E[33] 34.3 34.7 29.3
    ATLOP[33] 34.5 34.1 30.8
    GPT-3.5[35] 11.3 11.0 11.8
    LLaMA2-7B[39] 7.1 5.7 4.8
    LLaMA2-13B[39] 8.0 7.4 6.7
    下载: 导出CSV

    Table  7   Model Performance (F1 Score) Across Different Language Concentrations

    Model EN-ZH-30% EN-ZH-50% EN-ZH-70% EN-DE-30% EN-DE-50% EN-DE-70% ZH-JP-30% ZH-JP-50% ZH-JP-70%
    LSR[32] 33.7 32.0 31.9 31.2 29.4 31.4 30.3 27.5 29.2
    BERT-E[33] 36.8 36.5 36.1 33.0 31.7 33.8 38.1 32.0 31.4
    ATLOP[33] 36.5 35.8 35.0 31.1 32.6 31.9 37.3 37.1 37.3
    GPT-3.5[35] 12.1 13.3 12.2 13.2 11.6 14.2 12.8 18.7 12.1
    LLaMA2-7B[39] 6.3 5.1 5.2 9.9 9.9 11.3 6.5 1.2 3.8
    LLaMA2-13B[39] 7.2 7.9 8.1 10.5 11.7 9.6 7.9 2.0 7.2
    Note: The percentages 30%, 50%, and 70% denote varying concentrations of content converted from Language 1 to Language 2 when creating mix-lingual samples.
    下载: 导出CSV
  • [1]

    Zhang Y, Zhong V, Chen D, Angeli G, Manning C D. Position-aware attention and supervised data improve slot filling. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, Sept. 2017, pp.35–45. DOI: 10.18653/v1/D17-1004.

    [2]

    Zeng X, Zeng D, He S, Liu K, Zhao J. Extracting relational facts by an end-to-end neural model with copy mechanism. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2018, pp.506–514. DOI: 10.18653/v1/P18-1047.

    [3]

    Gardent C, Shimorina A, Narayan S, Perez-Beltrachini L. Creating training corpora for NLG micro-planners. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2017, pp.179–188. DOI: 10.18653/v1/P17-1017.

    [4]

    Yao Y, Ye D, Li P, Han X, Lin Y, Liu Z, Liu Z, Huang L, Zhou J, Sun M. DocRED: A large-scale document-level relation extraction dataset. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.764–777. DOI: 10.18653/v1/P19-1074.

    [5]

    Luan Y, He L, Ostendorf M, Hajishirzi H. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, Oct. 31–Nov. 4, 2018, pp.3219–3232. DOI: 10.18653/v1/D18-1360.

    [6]

    Cheng Q, Liu J, Qu X, Zhao J, Liang J, Wang Z, Huai B, Yuan N J, Xiao Y. HacRED: A large-scale relation extraction dataset toward hard cases in practical applications. In Proc. the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug. 2021, pp.2819–2831. DOI: 10.18653/v1/2021.findings-acl.249.

    [7]

    Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B. Joint extraction of entities and relations based on a novel tagging scheme. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2017, pp.1227–1236. DOI: 10.18653/v1/P17-1113.

    [8]

    Wei Z, Su J, Wang Y, Tian Y, Chang Y. A novel cascade binary tagging framework for relational triple extraction. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.1476–1488. DOI: 10.18653/v1/2020.acl-main.136.

    [9]

    Zhong Z, Chen D. A frustratingly easy approach for entity and relation extraction. In Proc. the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2021, pp.50–61. DOI: 10.18653/v1/2021.naacl-main.5.

    [10]

    Min B, Jiang Z, Freedman M, Weischedel R. Learning transferable representation for bilingual relation extraction via convolutional neural networks. In Proc. the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Nov. 2017, pp.674–684.

    [11]

    Ni J, Florian R. Neural cross-lingual relation extraction based on bilingual word embedding mapping. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov. 2019, pp.399–409. DOI: 10.18653/v1/D19-1038.

    [12]

    Winata G, Aji A F, Yong Z X, Solorio T. The decades progress on code-switching research in NLP: A systematic survey on trends and challenges. In Proc. the Findings of the Association for Computational Linguistics: ACL 2023, Jul. 2023, pp.2936–2978. DOI: 10.18653/v1/2023.findings-acl.185.

    [13]

    Winata G I, Madotto A, Wu C S, Fung P. Code-switched language models using neural based synthetic data from parallel sentences. In Proc. the 23rd Conference on Computational Natural Language Learning (CoNLL), Nov. 2019, pp.271–280. DOI: 10.18653/v1/K19-1026.

    [14]

    Kong L, Chu Y, Ma Z, Zhang J, He L, Chen J. MixRED: A mix-lingual relation extraction dataset. In Proc. the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, pp.11361–11370.

    [15]

    Zeng A, Xu B, Wang B et al. ChatGLM: A family of large language models from GLM-130B to GLM-4 all tools. arXiv: 2406.12793, 2024. https://arxiv.org/abs/2406.12793, Sept. 2024.

    [16]

    Han X, Zhu H, Yu P, Wang Z, Yao Y, Liu Z, Sun M. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, Oct. 31–Nov. 4, 2018, pp.4803–4809. DOI: 10.18653/v1/D18-1514.

    [17]

    Yang S, Choi M, Cho Y, Choo J. HistRED: A historical document-level relation extraction dataset. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1), Jul. 2023, pp.3207–3224. DOI: 10.18653/v1/2023.acl-long.180.

    [18]

    Li X L, Liang P. Prefix-tuning: Optimizing continuous prompts for generation. In Proc. the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1), Aug. 2021, pp.4582–4597. DOI: 10.18653/v1/2021.acl-long.353.

    [19]

    Hu J E, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-rank adaptation of large language models. In Proc. the 10th International Conference on Learning Representations, Apr. 2022.

    [20]

    Wan Z, Cheng F, Mao Z, Liu Q, Song H, Li J, Kurohashi S. GPT-RE: In-context learning for relation extraction using large language models. In Proc. the 2023 Conference on Empirical Methods in Natural Language Processing, Dec. 2023, pp.3534–3547. DOI: 10.18653/v1/2023.emnlp-main.214.

    [21]

    Li B, Fang G, Yang Y, Wang Q, Ye W, Zhao W, Zhang S. Evaluating ChatGPT’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness. arXiv: 2304.11633, 2023. https://arxiv.org/abs/2304.11633, Sept. 2024.

    [22]

    Li X, Polat F, Groth P. Do instruction-tuned large language models help with relation extraction? In Proc. the 1st Workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM) and the 2nd Challenge on Language Models for Knowledge Base Construction (LM-KBC) Co-Located with the 22nd International Semantic Web Conference, Nov. 2023.

    [23]

    Poplack S. Sometimes I’ll start a sentence in Spanish Y TERMINO EN ESPAÑOL: Toward a typology of code-switching. Linguistics, 1980, 18(7/8): 581–618. DOI: 10.1515/ling.1980.18.7-8.581.

    [24]

    Mihalcea R, Tarau P. TextRank: Bringing order into text. In Proc. the 2004 Conference on Empirical Methods in Natural Language Processing, Jul. 2004, pp.404–411.

    [25]

    Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 1998, 30(1–7): 107–117. DOI: 10.1016/S0169-7552(98)00110-X.

    [26]

    Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov. 2019, pp.3982–3992. DOI: 10.18653/v1/D19-1410.

    [27]

    Wang Y, Chen M, Zhou W, Cai Y, Liang Y, Liu D, Yang B, Liu J, Hooi B. Should we rely on entity mentions for relation extraction? Debiasing relation extraction with counterfactual analysis. In Proc. the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul. 2022, pp.3071–3081. DOI: 10.18653/v1/2022.naacl-main.224.

    [28]

    Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9.

    [29]

    Hendrickx I, Kim S N, Kozareva Z, Nakov P, Séaghdha D Ó, Padó S, Pennacchiotti M, Romano L, Szpakowicz S. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proc. the 5th International Workshop on Semantic Evaluation, Jul. 2010, pp.33–38.

    [30]

    Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R. The automatic content extraction (ACE) program – Tasks, data, and evaluation. In Proc. the 4th International Conference on Language Resources and Evaluation, May 2004.

    [31]

    Liu L, Li X, He R, Bing L, Joty S, Si L. Enhancing multilingual language model with massive multilingual knowledge triples. In Proc. the 2022 Conference on Empirical Methods in Natural Language Processing, Dec. 2022, pp.6878–6890. DOI: 10.18653/v1/2022.emnlp-main.462.

    [32]

    Nan G, Guo Z, Sekulic I, Lu W. Reasoning with latent structure refinement for document-level relation extraction. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.1546–1557. DOI: 10.18653/v1/2020.acl-main.141.

    [33]

    Zhou W, Huang K, Ma T, Huang J. Document-level relation extraction with adaptive thresholding and localized context pooling. In Proc. the 35th AAAI Conference on Artificial Intelligence, May 2021, pp.14612–14620. DOI: 10.1609/aaai.v35i16.17717.

    [34]

    Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V. Unsupervised cross-lingual representation learning at scale. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.8440–8451. DOI: 10.18653/v1/2020.acl-main.747.

    [35]

    Brown T B, Mann B, Ryder N et al. Language models are few-shot learners. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020.

    [36]

    Bai J, Bai S, Chu Y et al. Qwen technical report. arXiv: 2309.16609, 2023. https://arxiv.org/abs/2309.16609, Sept. 2024.

    [37]

    Muennighoff N, Wang T, Sutawika L et al. Crosslingual generalization through multitask finetuning. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp.15991–16111. DOI: 10.18653/v1/2023.acl-long.891.

    [38]

    Yang A, Xiao B, Wang B et al. Baichuan 2: Open large-scale language models. arXiv: 2309.10305, 2023. https://arxiv.org/abs/2309.10305, Sept. 2024.

    [39]

    Touvron H, Martin L, Stone K et al. Llama 2: Open foundation and fine-tuned chat models. arXiv: 2307.09288, 2023. https://arxiv.org/abs/2307.09288, Sept. 2024.

    [40]

    Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, Jun. 2019, pp.4171–4186. DOI: 10.18653/v1/N19-1423.

    [41]

    Cui Y, Che W, Liu T, Qin B, Yang Z. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2021, 29: 3504–3514. DOI: 10.1109/taslp.2021.3124365.

    [42]

    Wadhwa S, Amir S, Wallace B. Revisiting relation extraction in the era of large language models. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp.15566–15589. DOI: 10.18653/v1/2023.acl-long.868.

图(10)  /  表(7)
计量
  • 文章访问数:  400
  • HTML全文浏览量:  1
  • PDF下载量:  186
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-03-26
  • 录用日期:  2024-08-29
  • 网络出版日期:  2024-08-30
  • 刊出日期:  2025-02-22

目录

    /

    返回文章
    返回