[1] Kalchbrenner N, Blunsom P. Recurrent continuous translation models. In Proc. the Conference on Empirical Methods in Natural Language Processing, Oct. 2013, pp.1700-1709.[2] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems, Dec. 2014, pp.3104-3112.[3] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. ICLR, May 2015.[4] Och F J. Minimum error rate training in statistical machine translation. In Proc. the 41st Annual Meeting of the Association for Computational Linguistics, July 2003, pp.160-167.[5] Chiang D. A hierarchical phrase-based model for statistical machine translation. In Proc. the 43rd Annual Meeting of the Association for Computational Linguistics, June 2005, pp.263-270.[6] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.[7] Chung J, Gulcehre C, Cho K, Yoshua B. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014. https://arxiv. org/abs/1412.3555, May 2017.[8] Ranzato M, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. In Proc. ICLR, May 2016.[9] Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), Aug. 2016, pp.1683-1692.[10] Willams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3/4):229-256.[11] Smith D A, Eisner J. Minimum risk annealing for training log-linear models. In Proc. the COLING/ACL on Main Conference Poster Sessions, July 2006, pp.787-794.[12] He X, Deng L. Maximum expected BLEU training of phrase and lexicon translation models. In Proc. the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), July 2012, pp.292-301.[13] Gao J, He X, Yih W, Deng L. Learning continuous phrase representations for translation modeling. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), June 2014, pp.699-709.[14] Papineni K, Roukos S, Ward T, Zhu W J. BLEU:A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.311-318.[15] Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J. A study of translation edit rate with targeted human annotation. In Proc. the 7th Association for Machine Translation in the Americas, Aug. 2006, pp.223-231.[16] Watanabe T, Suzuki J, Tsukada H, Isozaki H. Online largemargin training for statistical machine translation. In Proc. the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), June 2007, pp.764-773.[17] Chiang D, Marton Y, Resnik P. Online large-margin training of syntactic and structural translation features. In Proc. the Conference on Empirical Methods in Natural Language Processing, Oct. 2008, pp.224-233.[18] Chiang D. Hope and fear for discriminative training of statistical translation models. The Journal of Machine Learning Research, 2012, 13(1):1159-1187.[19] Neubig G, Watanabe T. Optimization for statistical machine translation:A survey. Computational Linguistics, 2016, 42(2):1-54.[20] Kar P, Narasimhan H, Jain P. Online and stochastic gradient methods for non-decomposable loss functions. In Proc. the 27th Advances in Neural Information Processing Systems, Dec. 2014, pp.694-702.[21] Narasimhan H, Vaish R, Agarwal S. On the statistical consistency of plug-in classifiers for non-decomposable performance measures. In Proc. the 27th Advances in Neural Information Processing Systems, Dec. 2014, pp.1493-1501.[22] Jean S, Cho K, Memisevic R, Bengio Y. On using very large target vocabulary for neural machine translation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers), July 2015, pp.1-10.[23] Luong M T, Sutskever I, Le Q V, Vinyals O, Zaremba W. Addressing the rare word problem in neural machine translation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers), July 2015, pp.11-19.[24] He D, Xia Y, Qin T, Wang L, Yu N, Liu T, Ma W Y. Dual learning for machine translation. In Proc. the 30th Advances in Neural Information Processing Systems, Dec. 2016, pp.820-828.[25] Koehn P. Statistical significance tests for machine translation evaluation. In Proc. the Conference on Empirical Methods in Natural Language Processing, July 2004, pp.388-395. |