摘要:
研究背景 对话状态跟踪(Dialogue State Tracking,DST)利用对话历史来预测对话状态,这些对话状态通常表示为槽值对的形式。然而,由于缺乏同时利用对话历史和预定义值预测对话状态的有效策略,以往的工作在对话状态预测方面存在局限性。通过从预先定义的值集合中预测值,以前的判别式DST方法很难处理未知值。以前的生成式DST方法基于对话历史中的提及来确定值,这使得它们很难处理未覆盖和不可指向的提及。此外,现有的生成式DST方法通常忽略未标注的实例,并且难以处理标签噪声问题,这限制了提及的生成,最终影响了性能。
目标 我们的研究目标是,通过开发一种同时利用对话历史和预定义的值集合的统一策略,帮助对话状态跟踪模型处理未知值、未覆盖提及和不可指向的提及。此外,我们利用对话词与预定义的槽值对之间的语义关系,开发一种去燥的方法帮助模型处理标签噪音问题。
方法 我们提出了一个统一共享私有网络,具体来讲,首先,三个共享私有编码器将对话中的词和预定义的值编码为槽特定空间中的向量,这个空间包含了每个槽所有可能的值,然后,一个共享私有解码器利用复制机制从这个空间中生成值,最后,我们将对话词与预定的槽值对之间的语义关系视为非直接监督信息,并通过强化学习利用这些非直接监督信息减轻标签噪音问题对模型的影响。
结果 我们的方法在三个数据集上都取得了最佳的性能。此外,在零样本以及无监督对话状态跟踪任务上,我们的方法能够取得显著的提升。最后,我们通过消融实验验证了模型各个部分的有效性。
结论 在DST任务中,我们提出了一个统一的策略有效地从对话历史和预定义值中预测对话状态,同时,共享私有架构能够通过提取槽特有的以及槽间共享的特征提高性能。这种模型能够使用这个统一的策略处理未知值以及未覆盖的和不可指向的提及。此外,我们提出了一种基于语义关系的强化学习算法,它可以有效地处理标签噪声问题。因此,我们的方法可以为任务型对话系统准确地跟踪对话状态。在三个数据集上的实验表明,我们的模型明显优于基线模型。在未来的工作中,我们会将这种方法应用到更多的槽填充任务之中。
Highlight Context Dialogue state tracking (DST) leverages dialogue history to predict dialogue states,which are typically represented as slot-value pairs.However,previous work usually has limitations to efficiently predict values due to the lack of a powerful strategy for generating values from both the dialogue history and the predefined values.By predicting values from the predefined value set,previous discriminative DST methods are difficult to handle unknown values.Previous generative DST methods determine values based on mentions in the dialogue history,which makes it difficult for them to handle uncovered and non-pointable mentions.Besides,existing generative DST methods usually ignore the unlabeled instances and suffer from the label noise problem,which limits the generation of mentions and eventually hurts performance.
Objective The goal of our research is to help DST models deal with unknown values,uncovered mentions,and non-pointable mentions by developing a unified strategy that generates values from both the dialogue history and the predefined value set.In addition,we aim to design a denoising method that handles the label noise problem through semantic relations between conversational words and predefined slot-value pairs.
Method We propose a unified shared-private network.Specifically,first,three shared-private encoders transform conversational words and predefined values into slot-specific vectors,which constitute the generative space of each slot.The generative space contains all possible values of the slot.Then,a shared-private decoder generates values from the space via a copy mechanism.Finally,we use the semantic relations between the conversational words and the predefined slot-value pairs as indirect supervision to handle the label noise problem through reinforcement learning.
Results Our method achieves state-of-the-art performance on three datasets.In addition,our method achieves significant improvement in the zero-shot and unsupervised DST tasks.Finally,the effectiveness of each module of our model is verified by ablation experiments.
Conclusions In the DST task,we propose a unified strategy to efficiently predict dialogue states from both the dialogue history and the predefined value set.The shared-private architecture is able to improve performance by extracting slot-specific features as well as shared features.As a result,our model can handle unknown values as well as uncovered and non-pointable mentions through the unified strategy.In addition,we propose a reinforcement learning algorithm,which can effectively utilize semantic relations to handle the label noise problem.Therefore,our method can accurately track dialogue states for task-oriented dialogue systems.Experiments on three datasets show that our model significantly outperforms the baselines.In the future,we will apply our method in many other slot-filling tasks and explore automatic inference of predefined values in the unsupervised DST task.In the future,we will apply our method in many other slot-filling tasks.