Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (5): 1115-1126.doi: 10.1007/s11390-020-9576-4

Special Issue: Artificial Intelligence and Pattern Recognition

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node

Nuo Qun1,2,#, Hang Yan1,2,#, Xi-Peng Qiu1,2,*, Member, CCF, and Xuan-Jing Huang1,2, Member, CCF        

  1. 1 School of Computer Science, Fudan University, Shanghai 200433, China;
    2 Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China
  • Received:2019-03-23 Revised:2019-07-04 Online:2020-09-20 Published:2020-09-29
  • Contact: Xi-Peng Qiu E-mail:xpqiu@fudan.edu.cn
  • Supported by:
    This work is supported by the National Natural Science Foundation of China under Grant Nos. 61751201 and 61672162, and the Shanghai Municipal Science and Technology Major Project under Grant Nos. 2018SHZDZX01 and ZJLab.

Semi-Markov conditional random fields (Semi-CRFs) have been successfully utilized in many segmentation problems, including Chinese word segmentation (CWS). The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences. Despite its theoretical advantage, Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentence's length. In this paper, we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity. Specifically, we first adopt a bi-directional long short-term memory (BiLSTM) on character level to model the context information, and then use simple but effective fusion layer to represent the segment information. Besides, to model arbitrarily long segments within linear time complexity, we also propose a new model named Semi-CRFRelay. The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings. Experiments on four popular CWS datasets show the effectiveness of our proposed methods. The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/.

Key words: Semi-Markov conditional random field (Semi-CRF); Chinese word segmentation; bi-directional long short-term memory; deep learning;

[1] Xue N. Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(1):29-48.
[2] Lafferty J D, McCallum A, Pereira F C N. Conditional random fields:Probabilistic models for segmenting and labeling sequence data. In Proc. the 18th International Conference on Machine Learning, June 2001, pp.282-289.
[3] Zheng X, Chen H, Xu T. Deep learning for Chinese word segmentation and POS tagging. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013, pp.647-657.
[4] Pei W, Ge T, Chang B. Max-margin tensor neural network for Chinese word segmentation. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, pp.293-303.
[5] Chen X, Qiu X, Zhu C, Liu P, Huang X. Long short-term memory neural networks for Chinese word segmentation. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1197-1206.
[6] Chen X, Qiu X, Zhu C, Huang X. Gated recursive neural network for Chinese word segmentation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.1744-1753.
[7] Zhang Y, Clark S. Chinese segmentation with a word-based perceptron algorithm. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.840-847.
[8] Sun W. Word-based and character-based word segmentation models:Comparison and combination. In Proc. the 23rd International Conference on Computational Linguistics, August 2010, pp.1211-1219.
[9] Cai D, Zhao H. Neural word segmentation learning for Chinese. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.409-420.
[10] Zhang M, Zhang Y, Fu G. Transition-based neural word segmentation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.421-431.
[11] Liu Y, Che W, Guo J, Qin B, Liu T. Exploring segment representations for neural segmentation models. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.2880-2886.
[12] Sarawagi S, Cohen W. Semi-Markov conditional random fields for information extraction. In Proc. the Annual Conference on Neural Information Processing Systems, December 2005, pp.1185-1192.
[13] Andrew G. A hybrid Markov/semi-Markov conditional random field for sequence segmentation. In Proc. the 2006 Conference on Empirical Methods in Natural Language Processing, July 2006, pp.465-472.
[14] Sun X, Zhang Y, Matsuzaki T, Tsuruoka Y, Tsujii J. A discriminative latent variable Chinese segmenter with hybrid word/character information. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, May 2009, pp.56-64.
[15] Kong L, Dyer C, Smith N A. Segmental recurrent neural networks. In Proc. the 4th International Conference on Learning Representations, May 2015.
[16] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
[17] Chen X, Shi Z, Qiu X, Huang X. Adversarial multi-criteria learning for Chinese word segmentation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1193-1203.
[18] Chen X, Shi Z, Qiu X, Huang X. DAG-based long short-term memory for neural word segmentation. arXiv:1707.00248, 2017. https://arxiv.org/abs/1707.00248, August 2019.
[19] Yang J, Zhang Y, Liang S. Subword encoding in Lattice LSTM for Chinese word segmentation. arXiv:1810.12594, 2018. https://arxiv.org/abs/1810.12594, August 2019.
[20] Elman J L. Finding structure in time. Cognitive Science, 1990, 14(2):179-211.
[21] Song Y, Shi S, Li J, Zhang H. Directional skip-gram:Explicitly distinguishing left and right context for word embeddings. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, June 2018, pp.175-180.
[22] Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the 4th SIGHAN Workshop on Chinese Language Processing, June 2005, pp.123-133.
[23] Zeiler M D. ADADELTA:An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/1212.5701, August 2019.
[24] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[25] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.
[26] Ling W, Dyer C, Black A W, Trancoso I. Two/too simple adaptations of word2vec for syntax problems. In Proc. the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, May 2015, pp.1299-1304.
[27] Zhang Q, Liu X, Fu J. Neural networks incorporating dictionaries for Chinese word segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.5682-5689.
[28] Finkel J R, Manning C D. Nested named entity recognition. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.141-150.
[29] Ye Z, Ling Z. Hybrid semi-Markov CRF for neural sequence labeling. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.235-240.
[30] Sun X, Huang D, Song H, Ren F. Chinese new word identification:A latent discriminative model with global features. Journal of Computer Science and Technology, 2011, 26(1):14-24.
[1] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[2] Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance [J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791.
[3] Lan Chen, Juntao Ye, Xiaopeng Zhang. Multi-Feature Super-Resolution Network for Cloth Wrinkle Synthesis [J]. Journal of Computer Science and Technology, 2021, 36(3): 478-493.
[4] Yu-Jie Yuan, Yukun Lai, Tong Wu, Lin Gao, Li-Gang Liu. A Revisit of Shape Editing Techniques: From the Geometric to the Neural Viewpoint [J]. Journal of Computer Science and Technology, 2021, 36(3): 520-554.
[5] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning [J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663.
[6] Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning [J]. Journal of Computer Science and Technology, 2021, 36(2): 234-247.
[7] Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods [J]. Journal of Computer Science and Technology, 2021, 36(2): 334-346.
[8] Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology [J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360.
[9] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults [J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146.
[10] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results [J]. Journal of Computer Science and Technology, 2020, 35(3): 493-505.
[11] Zheng Zeng, Lu Wang, Bei-Bei Wang, Chun-Meng Kang, Yan-Ning Xu. Denoising Stochastic Progressive Photon Mapping Renderings Using a Multi-Residual Network [J]. Journal of Computer Science and Technology, 2020, 35(3): 506-521.
[12] Shuai Li, Zheng Fang, Wen-Feng Song, Ai-Min Hao, Hong Qin. Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation [J]. Journal of Computer Science and Technology, 2019, 34(3): 522-536.
[13] Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-Jie Xu, Cong Liu, Hai-Feng Liu, Shan Tang, Allen Rush, Willian Chen, Shao-Li Liu, Yun-Ji Chen, Tian-Shi Chen. BENCHIP: Benchmarking Intelligence Processors [J]. , 2018, 33(1): 1-23.
[14] Fei Hu, Li Li, Zi-Li Zhang, Jing-Yuan Wang, Xiao-Fei Xu. Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks [J]. , 2017, 32(4): 785-795.
[15] Wei Zhang, Chao-Wei Fang, Guan-Bin Li. Automatic Colorization with Improved Spatial Coherence and Boundary Localization [J]. , 2017, 32(3): 494-506.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[6] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[7] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[8] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[9] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[10] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved