›› 2012, Vol. 27 ›› Issue (6): 1302-1313.doi: 10.1007/s11390-012-1306-0

Special Issue: Artificial Intelligence and Pattern Recognition

• Machine Learning and Data Mining • Previous Articles    

A Unified Active Learning Framework for Biomedical Relation Extraction

Hong-Tao Zhang (张宏涛), Min-Lie Huang (黄民烈), and Xiao-Yan Zhu (朱小燕), Member CCF   

  1. State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2011-10-17 Revised:2012-05-25 Online:2012-11-05 Published:2012-11-05
  • Supported by:

    The work is supported by the National Natural Science Foundation of China under Grant No. 60973104 and the National Basic Research 973 Program of China under Grant No. 2012CB316301.

Supervised machine learning methods have been employed with great success in the task of biomedical relation extraction. However, existing methods are not practical enough, since manual construction of large training data is very expensive. Therefore, active learning is urgently needed for designing practical relation extraction methods with little human effort. In this paper, we describe a unified active learning framework. Particularly, our framework systematically addresses some practical issues during active learning process, including a strategy for selecting informative data, a data diversity selection algorithm, an active feature acquisition method, and an informative feature selection algorithm, in order to meet the challenges due to the immense amount of complex and diverse biomedical text. The framework is evaluated on protein- protein interaction (PPI) extraction and is shown to achieve promising results with a significant reduction in editorial effort and labeling time.

[1] Faro A, Giordano D, Spampinato C. Combining literaturetext mining with microarray data: Advances for system biol-ogy modeling. Brief Bioinform, 2012, 13(1): 61-82.

[2] Hunter L, Cohen K. Biomedical language processing: What'sbeyond PubMed? Mol Cell, 2006, 21(5): 589-594.

[3] Huang M, Ding S, Wang H, Zhu X. Mining physical protein-protein interactions from the literature. Genome Biology,2008, 9(Suppl 2): S12.

[4] Katrenko S, Adriaans P. Learning relations from biomedicalcorpora using dependency trees. In Lecture Notes in Com-puter Science, Tuyls K, Westra R, Saeys T et al. (eds.),Springer-Verlag, 2007, 4366, pp.61-80.

[5] Miwa M, S?tre R, Miyao Y, Tsujii J. A rich feature vector forprotein-protein interaction extraction from multiple corpora.In Proc. the Conference on Empirical Methods in NaturalLanguage Processing, August 2009, pp.121-130.

[6] Yang Z, Lin H, Li Y. BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature usingSVM and rich feature sets. Journal of Biomedical Informat-ics, 2010, 43(1): 88-96.

[7] Li Y, Hu X, Lin H, Yang Z. Learning an enriched representa-tion from unlabelled data for protein-protein interaction ex-traction. BMC Bioinformatics, 2010, 11(Suppl 2): S7.

[8] Landeghem S, Abeel T, Saeys Y, Peer Y. Discriminative andinformative features for biomolecular text mining with ensem-ble feature selection. Bioinformatics, 2010, 26(18): 554-560.

[9] Bui Q, Katrenko S, Sloot P. A hybrid approach to extractprotein-protein interactions. Bioinformatics, 2011, 27(2):259-265.

[10] van Landeghem S, Saeys Y, Deu Baets B, van De Peer Y.Extracting protein-protein interactions from text using richfeature vectors and feature selection. In Proc. the 3th In-ternational Symposium on Semantic Mining in Biomedicine,September 2008, pp.77-84.

[11] Fayruzov T, De Cock M, Cornelis C, Hoste V. Linguistic fea-ture analysis for protein interaction extraction. BMC Bioin-formatics, 2009, 10: 374.

[12] Miyao Y, Sagae K, S?tre R, Matsuzaki T, Tsujii J. Evaluatingcontributions of natural language parsers to protein-proteininteraction extraction. Bioinformatics, 2009, 25(3): 394-400.

[13] Niu Y, Otasek D, Jurisica I. Evaluation of linguistic featuresuseful in extraction of interactions from PubMed; Applicationto annotating known, high-throughput and predicted interac-tions in I2D. Bioinformatics, 2010, 26(1): 111-119.

[14] Erkan G, Ozgur A, Radev D. Semi-supervised classificationfor extracting protein interaction sentences using dependencyparsing. In Proc. the 2007 Joint Conference on EmpiricalMethods in Natural Language Processing and ComputationalNatural Language Learning, June 2007, pp.228-237.

[15] Kim S, Yoon J, Yang J. Kernel approaches for genic interac-tion extraction. Bioinformatics, 2008, 24(1): 118-126.

[16] Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F,Salakoski T. All-paths graph kernel for protein-protein in-teraction extraction with evaluation of cross-corpus learning.BMC Bioinformatics, 2008, 9(Suppl 11): S2.

[17] Segura-Bedmar I, Martnez P, de Pablo-S醤chez C. Using ashallow linguistic kernel for drug-drug interaction extraction.J. Biomed Inform, 2011, 44(5): 789-804.

[18] Burr S. Active learning literature survey. Technical Report,University of Wisconsin-Madison. 2009.

[19] Dai H, Chang Y, Tsai, R T, Hsu W. New challenges for bi-ological text-mining in the next decade. J. Comput. Sci.Technol., 2010, 25(1): 169-179.

[20] Wang M, Hua X. Active learning in multimedia annotationand retrieval: A survey. ACM Transactions on IntelligentSystems and Technology, 2011, 2(2), Article No. 10.

[21] Long B, Chapelle O, Zhang Y, Chang Y, Zheng Z, Tseng B.Active learning for ranking through expected loss optimiza-tion. In Proc. the 33rd Intarnational Conference on Re-search and Development in Information Retrieval, July 2010,pp.267-274.

[22] He X. Laplacian regularized d-optimal design for active learn-ing and its application to image retrieval. IEEE Transactionson Image Processing, 2010, 19(1):254-263.

[23] Bloodgood M, Callison-Burch C. Bucking the trend: Large-scale cost-focused active learning for statistical machine trans-lation. In Proc. the 48th Annual Meeting of the Associationfor Computational Linguistics, July 2010, pp.854-864.

[24] Mohamed T, Carbonell J, Ganapathiraju M. Active learn-ing for human protein-protein interaction prediction. BMCBioinformatics, 2010, 11(Suppl 1): S57.

[25] Klaus B. Incorporating diversity in active learning with sup-port vector machines. In Proc. the 20th International Con-ference on Machine Learning, August 2003, pp.59-66.

[26] Huang M, Zhu X, Hao Y, Payan D, Qu K, Li M. Discover-ing patterns to extract protein-protein interactions from fulltexts. Bioinformatics, 2004, 20(18): 3604-3612.

[27] Wu F, Weld D. Open information extraction using wikipedia.In Proc. the 48th ACL, 2010, pp.118-127.

[28] Yu L, Liu H. Efficient feature selection via analysis of rel-evance and redundancy. Journal of Machine Learning Re-search, 2004, 5: 1205-1224.

[29] Riloff E. Automatically generating extraction patterns fromuntagged text. In Proc. the 13th National Conference onArtificial Intelligence, August 1996, pp.1044-1049.

[30] Quinlan J. Unknown attribute values in induction. In Proc.the 6th Int. Workshop on Machine Learning, June 1989,pp.164-168.

[31] Zhang H, Huang M, Zhu X. Protein-protein interaction ex-traction from bio-literature with compact features and datasampling strategy. In Proc. the 4th BMEI, October 2011,pp.1779-1783.

[32] Pyysalo S, Airola A, Heimonen J et al. Comparative analysisof five protein-protein interaction corpora. BMC Bioinfor-matics, 2008, 9(Suppl 3): S6.
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved