|
计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (2): 295-308.doi: 10.1007/s11390-021-0286-3
所属专题: Artificial Intelligence and Pattern Recognition
Xin Tan (谭新), Long-Yin Zhang (张龙印), Member, CCF, and Guo-Dong Zhou* (周国栋), Distinguished Member, CCF, Member, ACM, IEEE
研究背景:机器翻译是利用计算机算法自主将一种源语言翻译成另一种目标语言的过程,隶属自然语言处理领域,具有重要的科研意义和实用价值。近年来,句子级神经机器翻译系统因其显著提升的翻译流畅度和准确度极大地缩短了句子级翻译任务中人工译文和机器译文之间的差距。然而,当面向篇章级神经机器翻译任务时,由于缺少对篇章的理解以及忽略了句子间的联系,神经机器翻译模型仍难以生成理想的翻译译文。因此,本文针对篇章级神经机器翻译展开研究,旨在通过对篇章的上下文信息进行建模提升篇章级神经机器翻译的译文质量。
目的:由于篇章上下文信息中的衔接性和连贯性对于篇章的理解至关重要,因此,我们认为对篇章的上下文信息进行有效建模可以提升篇章级神经机器翻译的译文质量。除此之外,为了有效避免由不准确的局部上下文信息引起的翻译问题、不断向后传播的不准确上下文信息引起的语义偏差问题等,我们提出通过对篇章全局上下文信息的层次化建模来提升篇章级神经机器翻译的译文质量。
方法:我们利用层次化的编码器对篇章全局上下文信息进行建模:首先,利用句子级编码器层捕获句子内的相关性;其次,利用篇章级编码器层捕获句子间的连贯性以及衔接性。最后,我们提出了一种对抽取的上下文信息进行有效分配的方法,即以自上而下的方式为每个单词配备全局上下文信息。这种分配方法能一次性完成上下文信息分配,从而有效地缓解了上下文信息的传播误差引起的翻译问题。同时,为每个单词配备特定的篇章上下文信息也有助于使每个单词在特定环境中的翻译更加稳健。值得注意的是,我们使用两步训练策略利用大规模的平行句对的优势来弥补单一领域的篇章语料库规模不足的问题,这有助于提升篇章级神经机器翻译的译文质量。
结果:我们在“中—英”以及“英—德”翻译任务上开展实验,实验数据来自三个不同领域,即TED,News和Europarl。大量实验结果表明,我们提出的模型与4个高水准的篇章级神经机器翻译模型相比有较大性能提升,其性能更是显著优于RNNSearch和Transformer翻译模型。特别值得说明的是,我们提出的模型可以显著改善文章中代词和名词的翻译。这进一步证明了我们的模型在抽取篇章衔接性和连贯性信息方面的有效性。
结论:我们针对篇章编码器的层数、编码器间的参数共享与否、是否使用两步训练策略以及使用何种后向分配方式进行了大量的实验分析。实验结果表明,篇章全局上下文信息对提升篇章的翻译质量起到了非常重要的作用。此外,通过对代词翻译的实验分析我们发现,篇章中的代词翻译对篇章级神经机器翻译的性能有着重要影响。今后,我们将以指代消解作为切入点对篇章级神经机器翻译展开更深入的研究。
[1] Sutskever I, Vinyals O, le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112. [2] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015. [3] Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.1243-1252. [4] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008. [5] Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118. [6] Wang L, Tu Z, Way A, Liu Q. Exploiting cross-sentence context for neural machine translation. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, September 2017, pp.2826-2831. DOI: 10.18653/v1/D17-1301. [7] Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y. Improving the transformer translation model with document-level context. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.533-542. DOI: 10.18653/v1/D18-1049. [8] Miculicich L, Ram D, Pappas N, Henderson J. Document-level neural machine translation with hierarchical attention networks. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2947-2954. DOI: 10.18653/v1/D18-1325. [9] Sordoni A, Bengio Y, Vahabi H, Lioma C, Simonsen J G, Nie J. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proc. the 24th ACM International on Conference on Information and Knowledge Management, October 2015, pp.553-562. DOI: 10.1145/2806416.2806493. [10] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. the 28th Annual Conference on Neural Information Processing Systems, December 2015, pp.2692-2700. [11] Dozat T, Christopher D M. Deep biaffine attention for neural dependency parsing. arXiv:1611.01734, 2017. https://arxiv.org/abs/1611.01734, October 2020. [12] Voita E, Serdyukov P, Sennrich R, Titov I. Context-aware neural machine translation learns anaphora resolution. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1264-1274. DOI: 10.18653/v1/P18-1117. [13] Tu Z, Liu Y, Shi S, Zhang T. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. DOI: 10.1162/tacl\textunderscore a\textunderscore 00029. [14] Kuang S, Xiong D, Luo W, Zhou G. Modeling coherence for neural machine translation with dynamic and topic caches. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.596-606. [15] Bawden R, Sennrich R, Birch A, Haddow B. Evaluating discourse phenomena in neural machine translation. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.1304-1313. DOI: 10.18653/v1/N18-1118. [16] Xiong H, He Z, Wu H, Wang H. Modeling coherence for discourse neural machine translation. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, pp.7338-7345. DOI: 10.1609/aaai.v33i01.33017338. [17] Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692. DOI: 10.18653/v1/P16-1159. [18] Tan X, Zhang L, Xiong D, Zhou G. Hierarchical modeling of global context for document-level neural machine translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1576-1585. DOI: 10.18653/v1/D19-1168. [19] Maruf S, Martins A, Haffari G. Selective attention for context-aware neural machine translation. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.3092-3102. DOI: 10.18653/v1/N19-1313. [20] Yang Z, Zhang J, Meng F, Gu S, Feng Y, Zhou J. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1527-1537. DOI: 10.18653/v1/D19-1164. [21] Cettolo M, Girardi C, Federico M. WIT3: Web inventory of transcribed and translated talks. In Proc. the 16th Conference of the European Association for Machine Translation, May 2012, pp.261-268. [22] Koehn P. Europarl: A parallel corpus for statistical machine translation. In Proc. the 10th Machine Translation Summit, September 2005, pp.79-86. [23] Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118. [24] Koehn P, Hoang H, Birch A, Callision-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: Open source toolkit for statistical machine translation. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.177-180. [25] Seenrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1715-1725. DOI: 10.18653/v1/P16-1162. [26] Klein G, Kim Y, Deng Y, Senellart J, Rush A. OpenNMT: Open-source toolkit for neural machine translation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.67-72. DOI: 10.18653/v1/P17-4012. [27] Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.311-318. DOI: 10.3115/1073083.1073135. [28] Lavie A, Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proc. the 2nd Workshop on Statistical Machine Translation, June 2007, pp.228-231. [29] Werlen L M, Popescu-Belis A. Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proc. the 3rd Workshop on Discourse in Machine Translation, September 2017, pp.17-25. DOI: 10.18653/v1/W17-4802. [30] Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2018, 26(3): 263-632. DOI: 10.1109/TASLP.2018.2789721. [31] Chen J, Li X, Zhang J, Zhou C, Cui J, Wang B, Su J. Modeling discourse structure for document-level neural machine translation. arXiv:2006.04721, 2020. https://arxiv.org/abs/2006.04721, June 2020. |
[1] | Shi-Qi Shen, Yang Liu, Mao-Song. 在神经机器翻译中优化不可分解的评价指标[J]. , 2017, 32(4): 796-804. |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |