A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System
-
Abstract
In this paper, we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes. Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler diver-gence in precision, recall and F1 evaluation. In our experiments, we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation. Character recognition rate was improved from 85.2% to 91%. We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.
-
-