Mix-lingual Relation Extraction: Dataset and A Training Approach
-
Abstract
Relation extraction is a pivotal task within the field of natural language processing, boasting numerous real-world applications. Existing research predominantly centers on monolingual relation extraction or cross-lingual enhancement for relation extraction. However, there exists a notable gap in understanding relation extraction within mix-lingual (or code-switching) scenarios. In these scenarios, individuals blend content from different languages within sentences, generating mix-lingual content. The effectiveness of existing relation extraction models in such scenarios remains largely unexplored due to the absence of dedicated datasets. To address this gap, we introduce the Mix-lingual Relation Extraction (MixRE) task and construct a human-annotated dataset MixRED to support this task. Additionally, we propose a hierarchical training approach for the mix-lingual scenario named Mix-lingual Training (MixTrain), designed to enhance the performance of large language models (LLMs) when capturing relational dependencies from mix-lingual content spanning different semantic levels. Our experiments involve evaluating state-of-the-art supervised models and LLMs on the constructed dataset, with results indicating that MixTrain notably improves model performance. Moreover, we investigate the effectiveness of using mix-lingual content as a tool to transfer learned relational dependencies across different languages. Additionally, we delve into factors influencing model performance for both supervised models and LLMs in the novel MixRE task.
-
-