We use cookies to improve your experience with our site.
张宏涛, 黄民烈, 朱小燕. 基于统一主动学习框架的生物实体关系抽取方法[J]. 计算机科学技术学报, 2012, 27(6): 1302-1313. DOI: 10.1007/s11390-012-1306-0
引用本文: 张宏涛, 黄民烈, 朱小燕. 基于统一主动学习框架的生物实体关系抽取方法[J]. 计算机科学技术学报, 2012, 27(6): 1302-1313. DOI: 10.1007/s11390-012-1306-0
Hong-Tao Zhang, Min-Lie Huang, Xiao-Yan Zhu. A Unified Active Learning Framework for Biomedical Relation Extraction[J]. Journal of Computer Science and Technology, 2012, 27(6): 1302-1313. DOI: 10.1007/s11390-012-1306-0
Citation: Hong-Tao Zhang, Min-Lie Huang, Xiao-Yan Zhu. A Unified Active Learning Framework for Biomedical Relation Extraction[J]. Journal of Computer Science and Technology, 2012, 27(6): 1302-1313. DOI: 10.1007/s11390-012-1306-0

基于统一主动学习框架的生物实体关系抽取方法

A Unified Active Learning Framework for Biomedical Relation Extraction

  • 摘要: 基于有监督机器学习的抽取方法的性能已经在实体关系抽取任务中得到证明.但是这类方法需要大量的标注数据集.一种可行的解决思路是在抽取方法中融入主动学习的思想.然而,如果直接利用经典的主动学习框架,则有可能面临“不完全特征描述”等问题.鉴于此,本文提出了统一的主动学习框架.该框架包括样本选择模块、多样性样本选择模块、主动特征获取模块以及相关特征选择模块,系统解决了在主动学习过程中面临的关键问题.实验结果表明本文所提出的框架能够利用较小规模的标注数据集获得较好的抽取性能.

     

    Abstract: Supervised machine learning methods have been employed with great success in the task of biomedical relation extraction. However, existing methods are not practical enough, since manual construction of large training data is very expensive. Therefore, active learning is urgently needed for designing practical relation extraction methods with little human effort. In this paper, we describe a unified active learning framework. Particularly, our framework systematically addresses some practical issues during active learning process, including a strategy for selecting informative data, a data diversity selection algorithm, an active feature acquisition method, and an informative feature selection algorithm, in order to meet the challenges due to the immense amount of complex and diverse biomedical text. The framework is evaluated on protein- protein interaction (PPI) extraction and is shown to achieve promising results with a significant reduction in editorial effort and labeling time.

     

/

返回文章
返回