We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Nuo Qun, Hang Yan, Xi-Peng Qiu, Xuan-Jing Huang. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node[J]. Journal of Computer Science and Technology, 2020, 35(5): 1115-1126. DOI: 10.1007/s11390-020-9576-4
Citation: Nuo Qun, Hang Yan, Xi-Peng Qiu, Xuan-Jing Huang. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node[J]. Journal of Computer Science and Technology, 2020, 35(5): 1115-1126. DOI: 10.1007/s11390-020-9576-4

Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node

Funds: This work is supported by the National Natural Science Foundation of China under Grant Nos. 61751201 and 61672162, and the Shanghai Municipal Science and Technology Major Project under Grant Nos. 2018SHZDZX01 and ZJLab.
More Information
  • Corresponding author:

    Xi-Peng Qiu E-mail: xpqiu@fudan.edu.cn

  • Received Date: March 22, 2019
  • Revised Date: July 03, 2019
  • Published Date: September 19, 2020
  • Semi-Markov conditional random fields (Semi-CRFs) have been successfully utilized in many segmentation problems, including Chinese word segmentation (CWS). The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences. Despite its theoretical advantage, Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentence's length. In this paper, we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity. Specifically, we first adopt a bi-directional long short-term memory (BiLSTM) on character level to model the context information, and then use simple but effective fusion layer to represent the segment information. Besides, to model arbitrarily long segments within linear time complexity, we also propose a new model named Semi-CRFRelay. The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings. Experiments on four popular CWS datasets show the effectiveness of our proposed methods. The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/.
  • [1]
    Xue N. Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(1):29-48.
    [2]
    Lafferty J D, McCallum A, Pereira F C N. Conditional random fields:Probabilistic models for segmenting and labeling sequence data. In Proc. the 18th International Conference on Machine Learning, June 2001, pp.282-289.
    [3]
    Zheng X, Chen H, Xu T. Deep learning for Chinese word segmentation and POS tagging. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013, pp.647-657.
    [4]
    Pei W, Ge T, Chang B. Max-margin tensor neural network for Chinese word segmentation. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, pp.293-303.
    [5]
    Chen X, Qiu X, Zhu C, Liu P, Huang X. Long short-term memory neural networks for Chinese word segmentation. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1197-1206.
    [6]
    Chen X, Qiu X, Zhu C, Huang X. Gated recursive neural network for Chinese word segmentation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.1744-1753.
    [7]
    Zhang Y, Clark S. Chinese segmentation with a word-based perceptron algorithm. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.840-847.
    [8]
    Sun W. Word-based and character-based word segmentation models:Comparison and combination. In Proc. the 23rd International Conference on Computational Linguistics, August 2010, pp.1211-1219.
    [9]
    Cai D, Zhao H. Neural word segmentation learning for Chinese. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.409-420.
    [10]
    Zhang M, Zhang Y, Fu G. Transition-based neural word segmentation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.421-431.
    [11]
    Liu Y, Che W, Guo J, Qin B, Liu T. Exploring segment representations for neural segmentation models. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.2880-2886.
    [12]
    Sarawagi S, Cohen W. Semi-Markov conditional random fields for information extraction. In Proc. the Annual Conference on Neural Information Processing Systems, December 2005, pp.1185-1192.
    [13]
    Andrew G. A hybrid Markov/semi-Markov conditional random field for sequence segmentation. In Proc. the 2006 Conference on Empirical Methods in Natural Language Processing, July 2006, pp.465-472.
    [14]
    Sun X, Zhang Y, Matsuzaki T, Tsuruoka Y, Tsujii J. A discriminative latent variable Chinese segmenter with hybrid word/character information. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, May 2009, pp.56-64.
    [15]
    Kong L, Dyer C, Smith N A. Segmental recurrent neural networks. In Proc. the 4th International Conference on Learning Representations, May 2015.
    [16]
    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
    [17]
    Chen X, Shi Z, Qiu X, Huang X. Adversarial multi-criteria learning for Chinese word segmentation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1193-1203.
    [18]
    Chen X, Shi Z, Qiu X, Huang X. DAG-based long short-term memory for neural word segmentation. arXiv:1707.00248, 2017. https://arxiv.org/abs/1707.00248, August 2019.
    [19]
    Yang J, Zhang Y, Liang S. Subword encoding in Lattice LSTM for Chinese word segmentation. arXiv:1810.12594, 2018. https://arxiv.org/abs/1810.12594, August 2019.
    [20]
    Elman J L. Finding structure in time. Cognitive Science, 1990, 14(2):179-211.
    [21]
    Song Y, Shi S, Li J, Zhang H. Directional skip-gram:Explicitly distinguishing left and right context for word embeddings. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, June 2018, pp.175-180.
    [22]
    Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the 4th SIGHAN Workshop on Chinese Language Processing, June 2005, pp.123-133.
    [23]
    Zeiler M D. ADADELTA:An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/1212.5701, August 2019.
    [24]
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1):1929-1958.
    [25]
    Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.
    [26]
    Ling W, Dyer C, Black A W, Trancoso I. Two/too simple adaptations of word2vec for syntax problems. In Proc. the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, May 2015, pp.1299-1304.
    [27]
    Zhang Q, Liu X, Fu J. Neural networks incorporating dictionaries for Chinese word segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.5682-5689.
    [28]
    Finkel J R, Manning C D. Nested named entity recognition. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.141-150.
    [29]
    Ye Z, Ling Z. Hybrid semi-Markov CRF for neural sequence labeling. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.235-240.
    [30]
    Sun X, Huang D, Song H, Ren F. Chinese new word identification:A latent discriminative model with global features. Journal of Computer Science and Technology, 2011, 26(1):14-24.
  • Related Articles

    [1]Chong Zhang, Hong-Zhi Wang, Hong-Wei Liu, Yi-Lin Chen. Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation[J]. Journal of Computer Science and Technology, 2024, 39(6): 1238-1247. DOI: 10.1007/s11390-023-2386-8
    [2]Lei Guan, Dong-Sheng Li, Ji-Ye Liang, Wen-Jian Wang, Ke-Shi Ge, Xi-Cheng Lu. Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview[J]. Journal of Computer Science and Technology, 2024, 39(3): 567-584. DOI: 10.1007/s11390-024-3872-3
    [3]Adam Weingram, Yuke Li, Hao Qi, Darren Ng, Liuyao Dai, Xiaoyi Lu. xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning[J]. Journal of Computer Science and Technology, 2023, 38(1): 166-195. DOI: 10.1007/s11390-023-2894-6
    [4]Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture[J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343. DOI: 10.1007/s11390-020-0679-8
    [5]Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning[J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663. DOI: 10.1007/s11390-020-0207-x
    [6]Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology[J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360. DOI: 10.1007/s11390-021-0849-3
    [7]Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods[J]. Journal of Computer Science and Technology, 2021, 36(2): 334-346. DOI: 10.1007/s11390-021-0861-7
    [8]Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning[J]. Journal of Computer Science and Technology, 2021, 36(2): 234-247. DOI: 10.1007/s11390-021-0851-9
    [9]Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults[J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146. DOI: 10.1007/s11390-020-9665-4
    [10]Ma Zhifang. DKBLM——Deep Knowledge Based Learning Methodology[J]. Journal of Computer Science and Technology, 1993, 8(4): 93-98.
  • Others

  • Cited by

    Periodical cited type(10)

    1. Ruiqi Shao, Peng Lin, Zhenhao Xu. Integrated natural language processing method for text mining and visualization of underground engineering text reports. Automation in Construction, 2024, 166: 105636. DOI:10.1016/j.autcon.2024.105636
    2. Jingping Liu, Juntao Liu, Lihan Chen, et al. Noun Compound Interpretation With Relation Classification and Paraphrasing. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(9): 8757. DOI:10.1109/TKDE.2022.3208617
    3. Youcef Djenouri, Asma Belhadi, Gautam Srivastava, et al. Interpretable intrusion detection for next generation of Internet of Things. Computer Communications, 2023, 203: 192. DOI:10.1016/j.comcom.2023.03.005
    4. Ijazul Haq, Weidong Qiu, Jie Guo, et al. Correction of whitespace and word segmentation in noisy Pashto text using CRF. Speech Communication, 2023, 153: 102970. DOI:10.1016/j.specom.2023.102970
    5. Jun Pan, Chaohua Zhang, Haijun Wang, et al. A comparative study of Chinese named entity recognition with different segment representations. Applied Intelligence, 2022, 52(11): 12457. DOI:10.1007/s10489-022-03274-0
    6. Lanfei He, Xuefei Zhang, Zhiwei Li, et al. A Chinese Named Entity Recognition Model of Maintenance Records for Power Primary Equipment Based on Progressive Multitype Feature Fusion. Complexity, 2022, 2022(1) DOI:10.1155/2022/8114217
    7. Yi-Liang Chung, Ping-Yu Hsu, Shih-Hsiang Huang. Num-Symbolic Homophonic Social Net-Words. Information, 2022, 13(4): 174. DOI:10.3390/info13040174
    8. Cunli Mao, Zhibo Man, Zhengtao Yu, et al. A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging. ACM Transactions on Asian and Low-Resource Language Information Processing, 2021, 20(4): 1. DOI:10.1145/3436818
    9. So Xue Thong, Eng Lip Tan, Ching Pang Goh. Sign Language to Text Translation with Computer Vision: Bridging the Communication Gap. 2024 3rd International Conference on Digital Transformation and Applications (ICDXA), DOI:10.1109/ICDXA61007.2024.10470532
    10. Yongsheng Wang, Weihua Feng, Xiaona Zhang, et al. Proceedings of 3rd International Conference on Artificial Intelligence, Robotics, and Communication. Lecture Notes in Electrical Engineering, DOI:10.1007/978-981-97-2200-6_32

    Other cited types(0)

Catalog

    Article views (44) PDF downloads (0) Cited by(10)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return