计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (6): 1407-1419.doi: 10.1007/s11390-020-0338-0

所属专题: Artificial Intelligence and Pattern Recognition

• • 上一篇    下一篇

一种用于对话状态跟踪的统一共享私有网络和去燥方法

Qing-Bin Liu1,2, Shi-Zhu He1,2,*, Member, CCF, Kang Liu1,2, Member, CCF, Sheng-Ping Liu3 and Jun Zhao1,2, Member, CCF   

  1. 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Beijing Unisound Information Technology Co., Ltd, Beijing 100096, China
  • 收稿日期:2020-01-27 修回日期:2021-01-21 出版日期:2021-11-30 发布日期:2021-12-01
  • 通讯作者: Shi-Zhu He E-mail:shizhu.he@nlpr.ia.ac.cn
  • 作者简介:Qing-Bin Liu received his B.S. degree in automation from Shandong University, Jinan, in 2015. Now, he is a Ph.D. candidate in Institute of Automation, Chinese Academy of Sciences, Beijing, majoring in computer application technology. His research interests include natural language processing and task-oriented dialogue system.
  • 基金资助:
    The work is supported by the National Natural Science Foundation of China under Grant Nos. 61533018, U1936207, 61976211, and 61702512, the Independent Research Project of National Laboratory of Pattern Recognition under Grant No. Z-2018013, the National Key Research and Development Program of China under Grant No. 2020AAA0106400, and the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No. 201912.

A Unified Shared-Private Network with Denoising for Dialogue State Tracking

Qing-Bin Liu1,2, Shi-Zhu He1,2,*, Member, CCF, Kang Liu1,2, Member, CCF, Sheng-Ping Liu3 and Jun Zhao1,2, Member, CCF        

  1. 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Beijing Unisound Information Technology Co., Ltd, Beijing 100096, China
  • Received:2020-01-27 Revised:2021-01-21 Online:2021-11-30 Published:2021-12-01
  • Contact: Shi-Zhu He E-mail:shizhu.he@nlpr.ia.ac.cn
  • Supported by:
    The work is supported by the National Natural Science Foundation of China under Grant Nos. 61533018, U1936207, 61976211, and 61702512, the Independent Research Project of National Laboratory of Pattern Recognition under Grant No. Z-2018013, the National Key Research and Development Program of China under Grant No. 2020AAA0106400, and the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No. 201912.

研究背景
对话状态跟踪(Dialogue State Tracking,DST)利用对话历史来预测对话状态,这些对话状态通常表示为槽值对的形式。然而,由于缺乏同时利用对话历史和预定义值预测对话状态的有效策略,以往的工作在对话状态预测方面存在局限性。通过从预先定义的值集合中预测值,以前的判别式DST方法很难处理未知值。以前的生成式DST方法基于对话历史中的提及来确定值,这使得它们很难处理未覆盖和不可指向的提及。此外,现有的生成式DST方法通常忽略未标注的实例,并且难以处理标签噪声问题,这限制了提及的生成,最终影响了性能。
目标
我们的研究目标是,通过开发一种同时利用对话历史和预定义的值集合的统一策略,帮助对话状态跟踪模型处理未知值、未覆盖提及和不可指向的提及。此外,我们利用对话词与预定义的槽值对之间的语义关系,开发一种去燥的方法帮助模型处理标签噪音问题。
方法
我们提出了一个统一共享私有网络,具体来讲,首先,三个共享私有编码器将对话中的词和预定义的值编码为槽特定空间中的向量,这个空间包含了每个槽所有可能的值,然后,一个共享私有解码器利用复制机制从这个空间中生成值,最后,我们将对话词与预定的槽值对之间的语义关系视为非直接监督信息,并通过强化学习利用这些非直接监督信息减轻标签噪音问题对模型的影响。
结果
我们的方法在三个数据集上都取得了最佳的性能。此外,在零样本以及无监督对话状态跟踪任务上,我们的方法能够取得显著的提升。最后,我们通过消融实验验证了模型各个部分的有效性。
结论
在DST任务中,我们提出了一个统一的策略有效地从对话历史和预定义值中预测对话状态,同时,共享私有架构能够通过提取槽特有的以及槽间共享的特征提高性能。这种模型能够使用这个统一的策略处理未知值以及未覆盖的和不可指向的提及。此外,我们提出了一种基于语义关系的强化学习算法,它可以有效地处理标签噪声问题。因此,我们的方法可以为任务型对话系统准确地跟踪对话状态。在三个数据集上的实验表明,我们的模型明显优于基线模型。在未来的工作中,我们会将这种方法应用到更多的槽填充任务之中。
Highlight
Context
Dialogue state tracking (DST) leverages dialogue history to predict dialogue states,which are typically represented as slot-value pairs.However,previous work usually has limitations to efficiently predict values due to the lack of a powerful strategy for generating values from both the dialogue history and the predefined values.By predicting values from the predefined value set,previous discriminative DST methods are difficult to handle unknown values.Previous generative DST methods determine values based on mentions in the dialogue history,which makes it difficult for them to handle uncovered and non-pointable mentions.Besides,existing generative DST methods usually ignore the unlabeled instances and suffer from the label noise problem,which limits the generation of mentions and eventually hurts performance.
Objective
The goal of our research is to help DST models deal with unknown values,uncovered mentions,and non-pointable mentions by developing a unified strategy that generates values from both the dialogue history and the predefined value set.In addition,we aim to design a denoising method that handles the label noise problem through semantic relations between conversational words and predefined slot-value pairs.
Method
We propose a unified shared-private network.Specifically,first,three shared-private encoders transform conversational words and predefined values into slot-specific vectors,which constitute the generative space of each slot.The generative space contains all possible values of the slot.Then,a shared-private decoder generates values from the space via a copy mechanism.Finally,we use the semantic relations between the conversational words and the predefined slot-value pairs as indirect supervision to handle the label noise problem through reinforcement learning.
Results
Our method achieves state-of-the-art performance on three datasets.In addition,our method achieves significant improvement in the zero-shot and unsupervised DST tasks.Finally,the effectiveness of each module of our model is verified by ablation experiments.
Conclusions
In the DST task,we propose a unified strategy to efficiently predict dialogue states from both the dialogue history and the predefined value set.The shared-private architecture is able to improve performance by extracting slot-specific features as well as shared features.As a result,our model can handle unknown values as well as uncovered and non-pointable mentions through the unified strategy.In addition,we propose a reinforcement learning algorithm,which can effectively utilize semantic relations to handle the label noise problem.Therefore,our method can accurately track dialogue states for task-oriented dialogue systems.Experiments on three datasets show that our model significantly outperforms the baselines.In the future,we will apply our method in many other slot-filling tasks and explore automatic inference of predefined values in the unsupervised DST task.In the future,we will apply our method in many other slot-filling tasks.

关键词: 对话状态跟踪, 统一策略, 共享私有网络, 强化学习

Abstract: Dialogue state tracking (DST) leverages dialogue information to predict dialogues states which are generally represented as slot-value pairs. However, previous work usually has limitations to efficiently predict values due to the lack of a powerful strategy for generating values from both the dialogue history and the predefined values. By predicting values from the predefined value set, previous discriminative DST methods are difficult to handle unknown values. Previous generative DST methods determine values based on mentions in the dialogue history, which makes it difficult for them to handle uncovered and non-pointable mentions. Besides, existing generative DST methods usually ignore the unlabeled instances and suffer from the label noise problem, which limits the generation of mentions and eventually hurts performance. In this paper, we propose a unified shared-private network (USPN) to generate values from both the dialogue history and the predefined values through a unified strategy. Specifically, USPN uses an encoder to construct a complete generative space for each slot and to discern shared information between slots through a shared-private architecture. Then, our model predicts values from the generative space through a shared-private decoder. We further utilize reinforcement learning to alleviate the label noise problem by learning indirect supervision from semantic relations between conversational words and predefined slot-value pairs. Experimental results on three public datasets show the effectiveness of USPN by outperforming state-of-the-art baselines in both supervised and unsupervised DST tasks.

Key words: dialogue state tracking, unified strategy, shared-private network, reinforcement learning

[1] Lei W, Jin X, Kan M Y, Ren Z, He X, Yin D. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1437-1447. DOI: 10.18653/v1/P18-1133.
[2] Wu C, Socher R, Xiong C. Global-to-local memory pointer networks for task-oriented dialogue. In Proc. the 7th International Conference on Learning Representations, May 2019.
[3] Xu P, Hu Q. An end-to-end approach for handling unknown slot values in dialogue state tracking. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1448-1457. DOI: 10.18653/v1/P18-1134.
[4] Zhong V, Xiong C, Socher R. Global-locally self-attentive encoder for dialogue state tracking. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1458-1467. DOI: 10.18653/v1/P18-1135.
[5] Mrkšić N, Séaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: Data-driven dialogue state tracking. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.1777-1788. DOI: 10.18653/v1/P17-1163.
[6] Ren L, Ni J, McAuley J. Scalable and accurate dialogue state tracking via hierarchical sequence generation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1876-1885. DOI: 10.18653/v1/D19-1196.
[7] Jang Y, Ham J, Lee B J, Chang Y, Kim K E. Neural dialog state tracker for large ontologies by attention mechanism. In Proc. the 2016 IEEE Spoken Language Technology Workshop, December 2016, pp.531-537. DOI: 10.110-9/SLT.2016.7846314.
[8] Mesnil G, Dauphin Y, Yao K et al. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech and Language Processing, 2017, 23(3): 530-539. DOI: 10.1109/TASLP.2014.2383614.
[9] Henderson M, Thomson B, Young S. Word-based dialog state tracking with recurrent neural networks. In Proc. the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, June 2014, pp.292-299. DOI: 10.3115/v1/W14-4340.
[10] Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In Proc. the 15th Conference of the European Chapter of the Association for Computational Linguistics, April 2017, pp.438-449.
[11] Ren L, Xie K, Chen L, Yu K. Towards universal dialogue state tracking. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2780-2786. DOI: 10.18653/v1/D18-1299.
[12] Trinh A, Ross R, Kelleher J. Energy-based modelling for dialogue state tracking. In Proc. the 1st Workshop on NLP for Conversational AI, August 2019, pp.77-86. DOI: 10.18653/v1/W19-4109.
[13] Rastogi A, Hakkani-Tür D, Heck L P. Scalable multidomain dialogue state tracking. arXiv:1712.10224, 2017. http://arxiv.org/abs/1712.10224, December 2020.
[14] Chao G L, Lane I. BERT-DST: Scalable end-to-end dialogue state tracking with bidirectional encoder representations from transformer. In Proc. the 20th Annual Conference of the International Speech Communication Association, September 2019, pp.1468-1472. DOI: 10.21437/interspeech.2019-1355.
[15] Ren H, Xu W, Yan Y. Markovian discriminative modeling for cross-domain dialog state tracking. In Proc. the 2014 IEEE Spoken Language Technology Workshop, December 2014, pp.342-347. DOI: 10.1109/SLT.2014.7078598.
[16] Mrkšić N, Séaghdha D Ó, Thomson B, Gašić M, Su P H, Vandyke D, Wen T H, Young S. Multi-domain dialog state tracking using recurrent neural networks. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.794-799. DOI: 10.3115/v1/P15-2130.
[17] Wu C S, Madotto A, Hosseini-Asl E, Xiong C, Socher R, Fung P. Transferable multi-domain state generator for taskoriented dialogue systems. In Proc. the 57th Conference of the Association for Computational Linguistics, July 2019, pp.808-819. DOI: 10.18653/v1/P19-1078.
[18] Zhang J G, Hashimoto K, Wu C S, Wan Y, Yu P, Socher R, Xiong C. Find or classify? Dual strategy for slotvalue predictions on multi-domain dialog state tracking. arXiv:1910.03544, 2019. http://arxiv.org/abs/1910.03544, October 2020.
[19] Chen W, Chen J, Su Y, Wang X, Yu D, Yan X, Wang W. XL-NBT: A cross-lingual neural belief tracking framework. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.414-424. DOI: 10.18653/v1/D18-1038.
[20] Chen Y, Hakkani-Tür D, He X. Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proc. the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2016, pp.6045-6049. DOI: 10.1109/ICASSP.2016.7472838.
[21] Chen Y N, Wang W Y, Gershman A, Rudnicky A. Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.483-494. DOI: 10.3115/v1/P15-1047.
[22] Jin X, Lei W, Ren Z, Chen H, Liang S, Zhao Y, Yin D. Explicit state tracking with semi-supervision for neural dialogue generation. arXiv:1808.10596, 2018. http://arxiv.org/abs/1808.10596, October 2020.
[23] Zhao T, Xie K, Eskénazi M. Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.1208-1218. DOI: 10.18653/v1/N19-1123.
[24] Chen L, Tan B, Long S, Yu K. Structured dialogue policy with graph neural networks. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.1257-1268.
[25] Gu J, Lu Z, Li H, Li V O. Incorporating copying mechanism in sequence-to-sequence learning. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1631-1640. DOI: 10.18653/v1/P16-1154.
[26] Cao P, Chen Y, Liu K, Zhao J, Liu S. Adversarial transfer learning for Chinese named entity recognition with selfattention mechanism. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 2018, pp.182-192. DOI: 10.18653/v1/D18-1017.
[27] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735.
[28] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[29] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014. http://arxiv.org/abs/1409.0473, September 2019.
[30] Tu Z, Lu Z, Liu Y, Liu X, Li H. Modeling coverage for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.76-85. DOI: 10.18653/v1/P16-1008.
[31] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.
[32] Osband I, Van Roy B. Why is posterior sampling better than optimism for reinforcement learning? In Proc. the 34th International Conference on Machine Learning, August 2017, pp.2701-2710.
[33] Henderson M, Thomson B, Williams J D. The second dialog state tracking challenge. In Proc. the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, June 2014, pp.263-272. DOI: 10.3115/v1/W14-4337.
[34] Bordes A, Boureau Y L, Weston J. Learning endto-end goal-oriented dialog. arXiv:1605.07683, 2016. http://arxiv.org/abs/1605.07683, October 2019.
[35] Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1532-1543. DOI: 10.3115/v1/D14-1162.
[36] Hashimoto K, Xiong C, Tsuruoka Y, Socher R. A joint many-task model: Growing a neural network for multiple NLP tasks. arXiv:1611.01587, 2016. http://arxiv.org/abs/1611.01587, November 2019.
[37] Kingma D P, Ba J L. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. http://arxiv.org/abs/1-412.6980, October 2019.
[1] Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. 基于动态偏度和稀疏度计算的A3C鲁棒性评估:一种并行计算视角[J]. 计算机科学技术学报, 2021, 36(5): 1002-1021.
[2] Jia-Ke Ge, Yan-Feng Chai, Yun-Peng Chai. WATuning:一种基于注意力机制的深度强化学习的工作负载感知调优系统[J]. 计算机科学技术学报, 2021, 36(4): 741-761.
[3] Yan Zheng, Jian-Ye Hao, Zong-Zhang Zhang, Zhao-Peng Meng, Xiao-Tian Hao. 一种多智能体合作式环境下基于带权估计的策略优化算法[J]. 计算机科学技术学报, 2020, 35(2): 268-280.
[4] Ai-Wen Jiang, Bo Liu, Ming-Wen Wang. 基于上下文引导型循环注意机制与深度多模态强化网络的图像问答算法[J]. , 2017, 32(4): 738-748.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[6] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[7] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[8] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[9] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[10] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: