? 利用无标注数据的神经网络语法错误检测
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (4) :758-767    DOI: 10.1007/s11390-017-1757-4
Special Issue on Deep Learning << Previous Articles | Next Articles >>
利用无标注数据的神经网络语法错误检测
Zhuo-Ran Liu1, Yang Liu2,3,4,5*, Member, CCF, ACM, IEEE
1 School of Software, Beihang University, Beijing 100191, China;
2 State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China;
3 Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;
4 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
5 Jiangsu Collaborative Innovation Center for Language Competence, Xuzhou 221009, China
Exploiting Unlabeled Data for Neural Grammatical Error Detection
Zhuo-Ran Liu1, Yang Liu2,3,4,5*, Member, CCF, ACM, IEEE
1 School of Software, Beihang University, Beijing 100191, China;
2 State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China;
3 Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;
4 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
5 Jiangsu Collaborative Innovation Center for Language Competence, Xuzhou 221009, China

摘要
参考文献
相关文章
Download: [PDF 377KB]  
摘要 近年来,检测和改正非母语作者撰写的文本中的语法错误受到越来越多的关注。虽然一些标注语料库已经被建立来辅助数据驱动的语法错误检测和改正方法,但是由于人工标注耗时耗力,非常昂贵,标注语料库在数量和领域覆盖方面仍然非常有限。在这篇论文中,我们提出利用无标注的数据来训练基于神经网络的语法错误检测模型。基本思想是将错误检测转化为二元分类问题,并从无标注的数据中产生正例和负例。我们引入了基于注意力的神经网络来捕获影响被检测词的长距离依赖关系。实验表明,我们所提出的方法效果超过了固定窗口上下文支持向量机模型和卷积网络模型。
关键词无标注数据   语法错误检测   神经网络     
Abstract: Identifying and correcting grammatical errors in the text written by non-native writers have received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate data-driven grammatical error detection and correction approaches, they are still limited in terms of quantity and coverage because human annotation is labor-intensive, time-consuming, and expensive. In this work, we propose to utilize unlabeled data to train neural network based grammatical error detection models. The basic idea is to cast error detection as a binary classification problem and derive positive and negative training examples from unlabeled data. We introduce an attention-based neural network to capture long-distance dependencies that influence the word being detected. Experiments show that the proposed approach significantly outperforms SVM and convolutional networks with fixed-size context window.
Keywordsunlabeled data   grammatical error detection   neural network     
Received 2016-12-20;
本文基金:

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61522204 and 61331013 and the National High Technology Research and Development 863 Program of China under Grant No. 2015AA015407. This research is also supported by the National Research Foundation of Singapore under its International Research Centre@Singapore Funding Initiative and administered by the IDM (Interactive Digital Media) Programme.

通讯作者: Yang Liu     Email: liuyang2011@tsinghua.edu.cn
About author: Zhuo-Ran Liu is an undergraduate student of School of Software at Beihang University, Beijing. This work was done while he was visiting the State Key Laboratory of Intelligent Technology and Systems at Tsinghua University, Beijing.
引用本文:   
Zhuo-Ran Liu, Yang Liu.利用无标注数据的神经网络语法错误检测[J]  Journal of Computer Science and Technology , 2017,V32(4): 758-767
Zhuo-Ran Liu, Yang Liu.Exploiting Unlabeled Data for Neural Grammatical Error Detection[J]  Journal of Computer Science and Technology, 2017,V32(4): 758-767
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1757-4
Copyright 2010 by Journal of Computer Science and Technology