Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (2): 295-308.doi: 10.1007/s11390-021-0286-3

Special Issue: Artificial Intelligence and Pattern Recognition

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

Xin Tan (谭新), Long-Yin Zhang (张龙印), Member, CCF, and Guo-Dong Zhou* (周国栋), Distinguished Member, CCF, Member, ACM, IEEE        

  1. School of Computer Science and Technology, Soochow University, Suzhou 215006, China
  • Received:2020-03-09 Revised:2020-10-10 Accepted:2021-01-11 Online:2022-03-31 Published:2022-03-31
  • Contact: Guo-Dong Zhou E-mail:gdzhou@suda.edu.cn
  • About author:Guo-Dong Zhou received his Ph.D. degree in computer science from the National University of Singapore, Singapore, in 1999. He is a distinguished professor in Soochow University, Suzhou. His research interests include natural language processing, information extraction and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61751206, 61673290 and 61876118, the Postgraduate Research & Practice Innovation Program of Jiangsu Province under Grant No. KYCX20_2669, and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.


Key words: neural machine translation; document-level translation; global context; hierarchical model ;

[1] Sutskever I, Vinyals O, le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.
[2] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[3] Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.1243-1252.
[4] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[5] Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.
[6] Wang L, Tu Z, Way A, Liu Q. Exploiting cross-sentence context for neural machine translation. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, September 2017, pp.2826-2831. DOI: 10.18653/v1/D17-1301.
[7] Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y. Improving the transformer translation model with document-level context. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.533-542. DOI: 10.18653/v1/D18-1049.
[8] Miculicich L, Ram D, Pappas N, Henderson J. Document-level neural machine translation with hierarchical attention networks. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2947-2954. DOI: 10.18653/v1/D18-1325.
[9] Sordoni A, Bengio Y, Vahabi H, Lioma C, Simonsen J G, Nie J. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proc. the 24th ACM International on Conference on Information and Knowledge Management, October 2015, pp.553-562. DOI: 10.1145/2806416.2806493.
[10] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. the 28th Annual Conference on Neural Information Processing Systems, December 2015, pp.2692-2700.
[11] Dozat T, Christopher D M. Deep biaffine attention for neural dependency parsing. arXiv:1611.01734, 2017. https://arxiv.org/abs/1611.01734, October 2020.
[12] Voita E, Serdyukov P, Sennrich R, Titov I. Context-aware neural machine translation learns anaphora resolution. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1264-1274. DOI: 10.18653/v1/P18-1117.
[13] Tu Z, Liu Y, Shi S, Zhang T. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. DOI: 10.1162/tacl\textunderscore a\textunderscore 00029.
[14] Kuang S, Xiong D, Luo W, Zhou G. Modeling coherence for neural machine translation with dynamic and topic caches. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.596-606.
[15] Bawden R, Sennrich R, Birch A, Haddow B. Evaluating discourse phenomena in neural machine translation. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.1304-1313. DOI: 10.18653/v1/N18-1118.
[16] Xiong H, He Z, Wu H, Wang H. Modeling coherence for discourse neural machine translation. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, pp.7338-7345. DOI: 10.1609/aaai.v33i01.33017338.
[17] Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692. DOI: 10.18653/v1/P16-1159.
[18] Tan X, Zhang L, Xiong D, Zhou G. Hierarchical modeling of global context for document-level neural machine translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1576-1585. DOI: 10.18653/v1/D19-1168.
[19] Maruf S, Martins A, Haffari G. Selective attention for context-aware neural machine translation. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.3092-3102. DOI: 10.18653/v1/N19-1313.
[20] Yang Z, Zhang J, Meng F, Gu S, Feng Y, Zhou J. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1527-1537. DOI: 10.18653/v1/D19-1164.
[21] Cettolo M, Girardi C, Federico M. WIT3: Web inventory of transcribed and translated talks. In Proc. the 16th Conference of the European Association for Machine Translation, May 2012, pp.261-268.
[22] Koehn P. Europarl: A parallel corpus for statistical machine translation. In Proc. the 10th Machine Translation Summit, September 2005, pp.79-86.
[23] Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.
[24] Koehn P, Hoang H, Birch A, Callision-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: Open source toolkit for statistical machine translation. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.177-180.
[25] Seenrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1715-1725. DOI: 10.18653/v1/P16-1162.
[26] Klein G, Kim Y, Deng Y, Senellart J, Rush A. OpenNMT: Open-source toolkit for neural machine translation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.67-72. DOI: 10.18653/v1/P17-4012.
[27] Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.311-318. DOI: 10.3115/1073083.1073135.
[28] Lavie A, Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proc. the 2nd Workshop on Statistical Machine Translation, June 2007, pp.228-231.
[29] Werlen L M, Popescu-Belis A. Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proc. the 3rd Workshop on Discourse in Machine Translation, September 2017, pp.17-25. DOI: 10.18653/v1/W17-4802.
[30] Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2018, 26(3): 263-632. DOI: 10.1109/TASLP.2018.2789721.
[31] Chen J, Li X, Zhang J, Zhou C, Cui J, Wang B, Su J. Modeling discourse structure for document-level neural machine translation. arXiv:2006.04721, 2020. https://arxiv.org/abs/2006.04721, June 2020.
[1] Shi-Qi Shen, Yang Liu, Mao-Song. Optimizing Non-Decomposable Evaluation Metrics for Neural Machine Translation [J]. , 2017, 32(4): 796-804.
[2] Bi-Xin Li, Xiao-Cong Fan, Jun Pang, and Jian-Jun Zhao. Model for Slicing JAVA Programs Hierarchically [J]. , 2004, 19(6): 0-0.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[3] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[4] Chen Qiming;. Extending the Object-Oriented Paradigm for Supporting Complex Objects[J]. , 1988, 3(2): 113 -130 .
[5] Fan Zhihua;. Vectorization for Loops with Three-Forked Jumps[J]. , 1988, 3(3): 186 -202 .
[6] Wang Nengbin; Liu Xiaoqing; Liu Guangfu;. A Software Tool for Constructing Traditional Chinese Medical Expert Systems[J]. , 1988, 3(3): 214 -220 .
[7] Hong Jiarong; Carl Uhrik;. The ALFALFA Entomology Pest Identification System[J]. , 1988, 3(4): 251 -262 .
[8] Xue Xing; Sun Zhongxiu; Zhou Jianqiang; Xu Xihao;. A Message-Based Distributed Kernel for a Full Heterogeneous Environment[J]. , 1990, 5(1): 47 -56 .
[9] Liu Weiyi;. An Efficient Algorithm for Processing Multi-Relation Queries in Relational Databases[J]. , 1990, 5(3): 236 -240 .
[10] Wang Haiying;. A Framework for Command Recovery in User Interface[J]. , 1990, 5(3): 296 -301 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved