计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (2): 295-308.doi: 10.1007/s11390-021-0286-3

所属专题: Artificial Intelligence and Pattern Recognition

• • 上一篇    下一篇

篇章级神经机器翻译中全局上下文信息的层次化建模

  

  • 收稿日期:2020-03-09 修回日期:2020-10-10 接受日期:2021-01-11 出版日期:2022-03-31 发布日期:2022-03-31

Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

Xin Tan (谭新), Long-Yin Zhang (张龙印), Member, CCF, and Guo-Dong Zhou* (周国栋), Distinguished Member, CCF, Member, ACM, IEEE        

  1. School of Computer Science and Technology, Soochow University, Suzhou 215006, China
  • Received:2020-03-09 Revised:2020-10-10 Accepted:2021-01-11 Online:2022-03-31 Published:2022-03-31
  • Contact: Guo-Dong Zhou E-mail:gdzhou@suda.edu.cn
  • About author:Guo-Dong Zhou received his Ph.D. degree in computer science from the National University of Singapore, Singapore, in 1999. He is a distinguished professor in Soochow University, Suzhou. His research interests include natural language processing, information extraction and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61751206, 61673290 and 61876118, the Postgraduate Research & Practice Innovation Program of Jiangsu Province under Grant No. KYCX20_2669, and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

研究背景:机器翻译是利用计算机算法自主将一种源语言翻译成另一种目标语言的过程,隶属自然语言处理领域,具有重要的科研意义和实用价值。近年来,句子级神经机器翻译系统因其显著提升的翻译流畅度和准确度极大地缩短了句子级翻译任务中人工译文和机器译文之间的差距。然而,当面向篇章级神经机器翻译任务时,由于缺少对篇章的理解以及忽略了句子间的联系,神经机器翻译模型仍难以生成理想的翻译译文。因此,本文针对篇章级神经机器翻译展开研究,旨在通过对篇章的上下文信息进行建模提升篇章级神经机器翻译的译文质量。
目的:由于篇章上下文信息中的衔接性和连贯性对于篇章的理解至关重要,因此,我们认为对篇章的上下文信息进行有效建模可以提升篇章级神经机器翻译的译文质量。除此之外,为了有效避免由不准确的局部上下文信息引起的翻译问题、不断向后传播的不准确上下文信息引起的语义偏差问题等,我们提出通过对篇章全局上下文信息的层次化建模来提升篇章级神经机器翻译的译文质量。
方法:我们利用层次化的编码器对篇章全局上下文信息进行建模:首先,利用句子级编码器层捕获句子内的相关性;其次,利用篇章级编码器层捕获句子间的连贯性以及衔接性。最后,我们提出了一种对抽取的上下文信息进行有效分配的方法,即以自上而下的方式为每个单词配备全局上下文信息。这种分配方法能一次性完成上下文信息分配,从而有效地缓解了上下文信息的传播误差引起的翻译问题。同时,为每个单词配备特定的篇章上下文信息也有助于使每个单词在特定环境中的翻译更加稳健。值得注意的是,我们使用两步训练策略利用大规模的平行句对的优势来弥补单一领域的篇章语料库规模不足的问题,这有助于提升篇章级神经机器翻译的译文质量。
结果:我们在“中—英”以及“英—德”翻译任务上开展实验,实验数据来自三个不同领域,即TED,News和Europarl。大量实验结果表明,我们提出的模型与4个高水准的篇章级神经机器翻译模型相比有较大性能提升,其性能更是显著优于RNNSearch和Transformer翻译模型。特别值得说明的是,我们提出的模型可以显著改善文章中代词和名词的翻译。这进一步证明了我们的模型在抽取篇章衔接性和连贯性信息方面的有效性。

结论:我们针对篇章编码器的层数、编码器间的参数共享与否、是否使用两步训练策略以及使用何种后向分配方式进行了大量的实验分析。实验结果表明,篇章全局上下文信息对提升篇章的翻译质量起到了非常重要的作用。此外,通过对代词翻译的实验分析我们发现,篇章中的代词翻译对篇章级神经机器翻译的性能有着重要影响。今后,我们将以指代消解作为切入点对篇章级神经机器翻译展开更深入的研究。


关键词: 神经机器翻译, 篇章翻译, 全局上下文, 层次模型

Abstract:

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.


Key words: neural machine translation, document-level translation, global context, hierarchical model

[1] Sutskever I, Vinyals O, le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.
[2] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[3] Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.1243-1252.
[4] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[5] Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.
[6] Wang L, Tu Z, Way A, Liu Q. Exploiting cross-sentence context for neural machine translation. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, September 2017, pp.2826-2831. DOI: 10.18653/v1/D17-1301.
[7] Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y. Improving the transformer translation model with document-level context. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.533-542. DOI: 10.18653/v1/D18-1049.
[8] Miculicich L, Ram D, Pappas N, Henderson J. Document-level neural machine translation with hierarchical attention networks. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2947-2954. DOI: 10.18653/v1/D18-1325.
[9] Sordoni A, Bengio Y, Vahabi H, Lioma C, Simonsen J G, Nie J. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proc. the 24th ACM International on Conference on Information and Knowledge Management, October 2015, pp.553-562. DOI: 10.1145/2806416.2806493.
[10] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. the 28th Annual Conference on Neural Information Processing Systems, December 2015, pp.2692-2700.
[11] Dozat T, Christopher D M. Deep biaffine attention for neural dependency parsing. arXiv:1611.01734, 2017. https://arxiv.org/abs/1611.01734, October 2020.
[12] Voita E, Serdyukov P, Sennrich R, Titov I. Context-aware neural machine translation learns anaphora resolution. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1264-1274. DOI: 10.18653/v1/P18-1117.
[13] Tu Z, Liu Y, Shi S, Zhang T. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. DOI: 10.1162/tacl\textunderscore a\textunderscore 00029.
[14] Kuang S, Xiong D, Luo W, Zhou G. Modeling coherence for neural machine translation with dynamic and topic caches. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.596-606.
[15] Bawden R, Sennrich R, Birch A, Haddow B. Evaluating discourse phenomena in neural machine translation. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.1304-1313. DOI: 10.18653/v1/N18-1118.
[16] Xiong H, He Z, Wu H, Wang H. Modeling coherence for discourse neural machine translation. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, pp.7338-7345. DOI: 10.1609/aaai.v33i01.33017338.
[17] Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692. DOI: 10.18653/v1/P16-1159.
[18] Tan X, Zhang L, Xiong D, Zhou G. Hierarchical modeling of global context for document-level neural machine translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1576-1585. DOI: 10.18653/v1/D19-1168.
[19] Maruf S, Martins A, Haffari G. Selective attention for context-aware neural machine translation. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.3092-3102. DOI: 10.18653/v1/N19-1313.
[20] Yang Z, Zhang J, Meng F, Gu S, Feng Y, Zhou J. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1527-1537. DOI: 10.18653/v1/D19-1164.
[21] Cettolo M, Girardi C, Federico M. WIT3: Web inventory of transcribed and translated talks. In Proc. the 16th Conference of the European Association for Machine Translation, May 2012, pp.261-268.
[22] Koehn P. Europarl: A parallel corpus for statistical machine translation. In Proc. the 10th Machine Translation Summit, September 2005, pp.79-86.
[23] Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.
[24] Koehn P, Hoang H, Birch A, Callision-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: Open source toolkit for statistical machine translation. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.177-180.
[25] Seenrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1715-1725. DOI: 10.18653/v1/P16-1162.
[26] Klein G, Kim Y, Deng Y, Senellart J, Rush A. OpenNMT: Open-source toolkit for neural machine translation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.67-72. DOI: 10.18653/v1/P17-4012.
[27] Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.311-318. DOI: 10.3115/1073083.1073135.
[28] Lavie A, Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proc. the 2nd Workshop on Statistical Machine Translation, June 2007, pp.228-231.
[29] Werlen L M, Popescu-Belis A. Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proc. the 3rd Workshop on Discourse in Machine Translation, September 2017, pp.17-25. DOI: 10.18653/v1/W17-4802.
[30] Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2018, 26(3): 263-632. DOI: 10.1109/TASLP.2018.2789721.
[31] Chen J, Li X, Zhang J, Zhou C, Cui J, Wang B, Su J. Modeling discourse structure for document-level neural machine translation. arXiv:2006.04721, 2020. https://arxiv.org/abs/2006.04721, June 2020.
[1] Shi-Qi Shen, Yang Liu, Mao-Song. 在神经机器翻译中优化不可分解的评价指标[J]. , 2017, 32(4): 796-804.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[3] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[4] 陈其明;. Extending the Object-Oriented Paradigm for Supporting Complex Objects[J]. , 1988, 3(2): 113 -130 .
[5] 范植华;. Vectorization for Loops with Three-Forked Jumps[J]. , 1988, 3(3): 186 -202 .
[6] 王能斌; 刘小青; 刘光富;. A Software Tool for Constructing Traditional Chinese Medical Expert Systems[J]. , 1988, 3(3): 214 -220 .
[7] 洪家荣; Carl Uhrik;. The ALFALFA Entomology Pest Identification System[J]. , 1988, 3(4): 251 -262 .
[8] 薛行; 孙钟秀; 周建强; 徐希豪;. A Message-Based Distributed Kernel for a Full Heterogeneous Environment[J]. , 1990, 5(1): 47 -56 .
[9] 刘惟一;. An Efficient Algorithm for Processing Multi-Relation Queries in Relational Databases[J]. , 1990, 5(3): 236 -240 .
[10] 王海鹰;. A Framework for Command Recovery in User Interface[J]. , 1990, 5(3): 296 -301 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: