›› 2013,Vol. 28 ›› Issue (6): 1106-1116.doi: 10.1007/s11390-013-1401-x

所属专题: Artificial Intelligence and Pattern Recognition

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

基于缺省项恢复的汉语句法分析改进研究

Guo-Dong Zhou (周国栋), Senior Member, CCF, Member, ACM, IEEE, and Pei-Feng Li (李培峰), Member, CCF   

  • 收稿日期:2012-06-20 修回日期:2013-09-18 出版日期:2013-11-05 发布日期:2013-11-05
  • 作者简介:Guo-Dong Zhou received the Ph.D. degree in computer science from the National University of Singapore in 1999. He joined the Institute for Infocomm Research, Singapore, in 1999, and had been an associate scientist, scientist and associate lead scientist at the institute until August 2006. Currently, he is a distinguished professor at the School of Computer Science and Technology, Soochow University, Suzhou. His research interests include natural language processing, information extraction and machine learning. He has been a senior member of CCF since 2008 and a member of ACM and IEEE since 1999.

Improving Syntactic Parsing of Chinese with Empty Element Recovery

Guo-Dong Zhou (周国栋), Senior Member, CCF, Member, ACM, IEEE, and Pei-Feng Li (李培峰), Member, CCF   

  1. Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, Suzhou 215006, China
  • Received:2012-06-20 Revised:2013-09-18 Online:2013-11-05 Published:2013-11-05
  • Supported by:

    Supported by the National Natural Science Foundation of China under Grant Nos. 61273320, 61331011, 61070123, and the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011102.

本文从句法分析角度提出并探讨了汉语中被普遍忽略的缺省项恢复问题。首先,我们通过统计分析和初步实验验证了缺省项在汉语句法分析中的巨大作用和受益方式。然后,我们提出了两种缺省项恢复方法:联合成分分析和基于组块的依存分析。在汉语树库CTB 5.1上的实验表明,将缺省项恢复集成到Charniak句法分析器中能极大地提升该句法分析器的性能F1值1.29。据我们所知,这是首次在汉语句法分析中全面深入地探索缺省项问题,值得在未来研究中更多关注并加强探索。

Abstract: This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in F1-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.

[1] Marcus M P, Marcinkiewicz M A, Santorini B. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 1993, 19(2): 313-330.

[2] Collins M. Head-driven statistical models for natural language parsing [Ph.D. Thesis]. University of Pennsylvania, 1999.

[3] Charniak E. A maximum-entropy-inspired parser. In Proc. the 1st North American Chapter of the Association for Computational Linguistics Conference, April 2000, pp.132-139.

[4] Petrov S, Klein D. Improved inference for unlexicalized parsing. In Proc. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, April 2007, pp.404-411.

[5] Zhao S H, Ng H T. Identification and resolution of Chinese zero pronouns: A machine learning approach. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.541-550.

[6] Kong F, Zhou G D. A tree kernel-based unified framework for Chinese zero anaphora resolution. In Proc. the 2010 Conference on Empirical Methods in Natural Language Processing, October 2010, pp.882-891.

[7] Kim Y J. Subject/object drop in the acquisition of Korean: A cross-linguistic comparison. Journal of East Asian Linguistics, 2000, 9(4): 325-351.

[8] Chung T, Gildea D. Effects of empty categories on machine translation. In Proc. the 2010 Conference on Empirical Methods in Natural Language Processing, October 2010, pp.636-645.

[9] Campbell R. Using linguistic principles to recover empty categories. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, July 2004, pp.645-652.

[10] Guo Y Q, Wang H F, van Genabith J. Recovering non-local dependencies for Chinese. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.257-266.

[11] Bikel D M. On the parameter space of generative lexicalized statistical parsing models [Ph.D. Thesis]. University of Pennsylvania, 2004.

[12] Johnson M. A simple pattern-matching algorithm for recovering empty nodes and their antecedents. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.136-143.

[13] Dienes P, Dubey A. Antecedent recovery: Experiments with a trace tagger. In Proc. the 2003 Conference on Empirical Methods in Natural Language Processing, July 2003, pp.3340.

[14] Dienes P, Dubey A. Deep syntactic processing by combining shallow methods. In Proc. the 41st Annual Meeting of the Association for Computational Linguistic, July 2003, pp.431438.

[15] Yang Y Q, Xue N W. Chasing the ghost: Recovering empty categories in the Chinese TreeBank. In Proc. the 23rd International Conference on Computational Linguistics, August 2010, pp.1382-1390.

[16] Xue N W, Yang Y Q. Dependency-based empty category detection via phrase structure trees. In Proc. the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2013, pp.1051-1060.

[17] Cai S, Chiang D, Goldbery Y. Languageindependent parsing with empty elements. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics, June 2011, pp.212-216.

[18] Cahill A, Burke M, O'Donovan R, van Genabith J, Way A. Long-distance dependency resolution in automatically acquired wide-coverage pcfg-based LFG approximations. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, July 2004, pp.319-326.

[19] Schmid H. Trace prediction and recovery with unlexicalized PCFGs and slash features. In Proc. the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, July 2006, pp.177-184.

[20] Xue N W, Xia F. The bracketing guidelines for Penn Chinese Treebank project. Technical Report, IRCS 00-08, University of Pennsylvania.

[21] Finkel R J, Manning D C. Joint parsing and named entity recognition. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, May 2009, pp.326-334.

[22] Nivre J. An efficient algorithm for projective dependency parsing. In Proc. the 8th International Workshop on Parsing Technology, April 2003, pp.149-160.

[23] Xue N W. Labeling Chinese predicates with semantic roles. Computational Linguistics, 2008, 34(2): 225-255.

[24] Li J H, Zhou G D, Zhao H, Zhu Q M, Qian P D. Improving nominal SRL in Chinese language with verbal SRL information and automatic predicate recognition. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.1280-1288.

[25] Li J H, Zhou G D, Ng H T. Joint syntactic and semantic parsing of Chinese. In Proc. the 48th Annual Meeting of the Association for Computational Linguistics, July 2010, pp.11081117.

[26] Cohen P R. Empirical Methods for Artificial Intelligence. Cambridge, USA: MIT Press, 1995.

[27] Chen W L, Kazama J, Uchimoto K, Torisawa K. Improving dependency parsing with subtrees from auto-parsed data. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.570-579.

[28] Zhou G D, Kong F. Learning noun phrase anaphoricity in coreference resolution via label propagation. Journal of Computer Science and Technology, 2011, 26(1): 34-44.

[29] Zhou G D, Zhu Q M. Kernel-based semantic relation detection and classification via enriched parse tree structure. Journal of Computer Science and Technology, 2011, 26(1): 45-56.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孟力明; 徐晓飞; 常会友; 陈光熙; 胡铭曾; 李生;. A Tree-Structured Database Machine for Large Relational Database Systems[J]. , 1987, 2(4): 265 -275 .
[2] 史维更;. Reconnectable Network with Limited Resources[J]. , 1991, 6(3): 243 -249 .
[3] 周哈阳;. Analogical Learning and Automated Rule Constructions[J]. , 1991, 6(4): 316 -328 .
[4] 林珊;. Using a Student Model to Improve Explanation in an ITS[J]. , 1992, 7(1): 92 -96 .
[5] 沈一栋;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[6] 赵彧; 张琼; 向辉; 石教英; 何志均;. A Simplified Model for Generating 3D Realistic Sound in the Multimedia and Virtual Reality Systems[J]. , 1996, 11(4): 461 -470 .
[7] 刘铁英; 叶新铭;. An Algorithm for Determining Minimal Reduced-Coverings of Acyclic Database Schemes[J]. , 1996, 11(4): 347 -355 .
[8] 王坚;. Integration Model of Eye-Gaze, Voice and Manual Response in Multimodal User Interface[J]. , 1996, 11(5): 512 -518 .
[9] 吕卫锋; 张玉平;. Experimental Study on Strategy of CombiningSAT Algorithms[J]. , 1998, 13(6): 608 -614 .
[10] 罗军舟; 顾冠群; 费翔;. An Architectural Model for Intelligent Network Management[J]. , 2000, 15(2): 136 -143 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: