Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (6): 1407-1419.doi: 10.1007/s11390-020-0338-0

Special Issue: Artificial Intelligence and Pattern Recognition

• Regular Paper • Previous Articles     Next Articles

A Unified Shared-Private Network with Denoising for Dialogue State Tracking

Qing-Bin Liu1,2, Shi-Zhu He1,2,*, Member, CCF, Kang Liu1,2, Member, CCF, Sheng-Ping Liu3 and Jun Zhao1,2, Member, CCF        

  1. 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
    2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Beijing Unisound Information Technology Co., Ltd, Beijing 100096, China
  • Received:2020-01-27 Revised:2021-01-21 Online:2021-11-30 Published:2021-12-01
  • Contact: Shi-Zhu He E-mail:shizhu.he@nlpr.ia.ac.cn
  • Supported by:
    The work is supported by the National Natural Science Foundation of China under Grant Nos. 61533018, U1936207, 61976211, and 61702512, the Independent Research Project of National Laboratory of Pattern Recognition under Grant No. Z-2018013, the National Key Research and Development Program of China under Grant No. 2020AAA0106400, and the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No. 201912.

Dialogue state tracking (DST) leverages dialogue information to predict dialogues states which are generally represented as slot-value pairs. However, previous work usually has limitations to efficiently predict values due to the lack of a powerful strategy for generating values from both the dialogue history and the predefined values. By predicting values from the predefined value set, previous discriminative DST methods are difficult to handle unknown values. Previous generative DST methods determine values based on mentions in the dialogue history, which makes it difficult for them to handle uncovered and non-pointable mentions. Besides, existing generative DST methods usually ignore the unlabeled instances and suffer from the label noise problem, which limits the generation of mentions and eventually hurts performance. In this paper, we propose a unified shared-private network (USPN) to generate values from both the dialogue history and the predefined values through a unified strategy. Specifically, USPN uses an encoder to construct a complete generative space for each slot and to discern shared information between slots through a shared-private architecture. Then, our model predicts values from the generative space through a shared-private decoder. We further utilize reinforcement learning to alleviate the label noise problem by learning indirect supervision from semantic relations between conversational words and predefined slot-value pairs. Experimental results on three public datasets show the effectiveness of USPN by outperforming state-of-the-art baselines in both supervised and unsupervised DST tasks.

Key words: dialogue state tracking; unified strategy; shared-private network; reinforcement learning;

[1] Lei W, Jin X, Kan M Y, Ren Z, He X, Yin D. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1437-1447. DOI: 10.18653/v1/P18-1133.
[2] Wu C, Socher R, Xiong C. Global-to-local memory pointer networks for task-oriented dialogue. In Proc. the 7th International Conference on Learning Representations, May 2019.
[3] Xu P, Hu Q. An end-to-end approach for handling unknown slot values in dialogue state tracking. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1448-1457. DOI: 10.18653/v1/P18-1134.
[4] Zhong V, Xiong C, Socher R. Global-locally self-attentive encoder for dialogue state tracking. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1458-1467. DOI: 10.18653/v1/P18-1135.
[5] Mrkšić N, Séaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: Data-driven dialogue state tracking. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.1777-1788. DOI: 10.18653/v1/P17-1163.
[6] Ren L, Ni J, McAuley J. Scalable and accurate dialogue state tracking via hierarchical sequence generation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1876-1885. DOI: 10.18653/v1/D19-1196.
[7] Jang Y, Ham J, Lee B J, Chang Y, Kim K E. Neural dialog state tracker for large ontologies by attention mechanism. In Proc. the 2016 IEEE Spoken Language Technology Workshop, December 2016, pp.531-537. DOI: 10.110-9/SLT.2016.7846314.
[8] Mesnil G, Dauphin Y, Yao K et al. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech and Language Processing, 2017, 23(3): 530-539. DOI: 10.1109/TASLP.2014.2383614.
[9] Henderson M, Thomson B, Young S. Word-based dialog state tracking with recurrent neural networks. In Proc. the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, June 2014, pp.292-299. DOI: 10.3115/v1/W14-4340.
[10] Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In Proc. the 15th Conference of the European Chapter of the Association for Computational Linguistics, April 2017, pp.438-449.
[11] Ren L, Xie K, Chen L, Yu K. Towards universal dialogue state tracking. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2780-2786. DOI: 10.18653/v1/D18-1299.
[12] Trinh A, Ross R, Kelleher J. Energy-based modelling for dialogue state tracking. In Proc. the 1st Workshop on NLP for Conversational AI, August 2019, pp.77-86. DOI: 10.18653/v1/W19-4109.
[13] Rastogi A, Hakkani-Tür D, Heck L P. Scalable multidomain dialogue state tracking. arXiv:1712.10224, 2017. http://arxiv.org/abs/1712.10224, December 2020.
[14] Chao G L, Lane I. BERT-DST: Scalable end-to-end dialogue state tracking with bidirectional encoder representations from transformer. In Proc. the 20th Annual Conference of the International Speech Communication Association, September 2019, pp.1468-1472. DOI: 10.21437/interspeech.2019-1355.
[15] Ren H, Xu W, Yan Y. Markovian discriminative modeling for cross-domain dialog state tracking. In Proc. the 2014 IEEE Spoken Language Technology Workshop, December 2014, pp.342-347. DOI: 10.1109/SLT.2014.7078598.
[16] Mrkšić N, Séaghdha D Ó, Thomson B, Gašić M, Su P H, Vandyke D, Wen T H, Young S. Multi-domain dialog state tracking using recurrent neural networks. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.794-799. DOI: 10.3115/v1/P15-2130.
[17] Wu C S, Madotto A, Hosseini-Asl E, Xiong C, Socher R, Fung P. Transferable multi-domain state generator for taskoriented dialogue systems. In Proc. the 57th Conference of the Association for Computational Linguistics, July 2019, pp.808-819. DOI: 10.18653/v1/P19-1078.
[18] Zhang J G, Hashimoto K, Wu C S, Wan Y, Yu P, Socher R, Xiong C. Find or classify? Dual strategy for slotvalue predictions on multi-domain dialog state tracking. arXiv:1910.03544, 2019. http://arxiv.org/abs/1910.03544, October 2020.
[19] Chen W, Chen J, Su Y, Wang X, Yu D, Yan X, Wang W. XL-NBT: A cross-lingual neural belief tracking framework. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.414-424. DOI: 10.18653/v1/D18-1038.
[20] Chen Y, Hakkani-Tür D, He X. Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proc. the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2016, pp.6045-6049. DOI: 10.1109/ICASSP.2016.7472838.
[21] Chen Y N, Wang W Y, Gershman A, Rudnicky A. Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.483-494. DOI: 10.3115/v1/P15-1047.
[22] Jin X, Lei W, Ren Z, Chen H, Liang S, Zhao Y, Yin D. Explicit state tracking with semi-supervision for neural dialogue generation. arXiv:1808.10596, 2018. http://arxiv.org/abs/1808.10596, October 2020.
[23] Zhao T, Xie K, Eskénazi M. Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.1208-1218. DOI: 10.18653/v1/N19-1123.
[24] Chen L, Tan B, Long S, Yu K. Structured dialogue policy with graph neural networks. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.1257-1268.
[25] Gu J, Lu Z, Li H, Li V O. Incorporating copying mechanism in sequence-to-sequence learning. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1631-1640. DOI: 10.18653/v1/P16-1154.
[26] Cao P, Chen Y, Liu K, Zhao J, Liu S. Adversarial transfer learning for Chinese named entity recognition with selfattention mechanism. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 2018, pp.182-192. DOI: 10.18653/v1/D18-1017.
[27] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735.
[28] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[29] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014. http://arxiv.org/abs/1409.0473, September 2019.
[30] Tu Z, Lu Z, Liu Y, Liu X, Li H. Modeling coverage for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.76-85. DOI: 10.18653/v1/P16-1008.
[31] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. the 2014 Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.
[32] Osband I, Van Roy B. Why is posterior sampling better than optimism for reinforcement learning? In Proc. the 34th International Conference on Machine Learning, August 2017, pp.2701-2710.
[33] Henderson M, Thomson B, Williams J D. The second dialog state tracking challenge. In Proc. the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, June 2014, pp.263-272. DOI: 10.3115/v1/W14-4337.
[34] Bordes A, Boureau Y L, Weston J. Learning endto-end goal-oriented dialog. arXiv:1605.07683, 2016. http://arxiv.org/abs/1605.07683, October 2019.
[35] Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1532-1543. DOI: 10.3115/v1/D14-1162.
[36] Hashimoto K, Xiong C, Tsuruoka Y, Socher R. A joint many-task model: Growing a neural network for multiple NLP tasks. arXiv:1611.01587, 2016. http://arxiv.org/abs/1611.01587, November 2019.
[37] Kingma D P, Ba J L. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. http://arxiv.org/abs/1-412.6980, October 2019.
[1] Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View [J]. Journal of Computer Science and Technology, 2021, 36(5): 1002-1021.
[2] Jia-Ke Ge, Yan-Feng Chai, Yun-Peng Chai. WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning [J]. Journal of Computer Science and Technology, 2021, 36(4): 741-761.
[3] Yan Zheng, Jian-Ye Hao, Zong-Zhang Zhang, Zhao-Peng Meng, Xiao-Tian Hao. Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments [J]. Journal of Computer Science and Technology, 2020, 35(2): 268-280.
[4] Lei Cui, Youyang Qu, Mohammad Reza Nosouhi, Shui Yu, Jian-Wei Niu, Gang Xie. Improving Data Utility Through Game Theory in Personalized Differential Privacy [J]. Journal of Computer Science and Technology, 2019, 34(2): 272-286.
[5] Ai-Wen Jiang, Bo Liu, Ming-Wen Wang. Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering [J]. , 2017, 32(4): 738-748.
[6] Mahsa Chitsaz, and Chaw Seng Woo, Member, IEEE. Software Agent with Reinforcement Learning Approach for Medical Image Segmentation [J]. , 2011, 26(2): 247-255.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[6] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[7] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[8] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[9] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[10] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved