We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Ya-Li Li, Wei-Qun Xu, Yong-Hong Yan. A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System[J]. Journal of Computer Science and Technology, 2012, (2): 443-450. DOI: 10.1007/s11390-012-1233-0
Citation: Ya-Li Li, Wei-Qun Xu, Yong-Hong Yan. A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System[J]. Journal of Computer Science and Technology, 2012, (2): 443-450. DOI: 10.1007/s11390-012-1233-0

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

Funds: This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319.
More Information
  • Received Date: December 08, 2010
  • Revised Date: September 13, 2011
  • Published Date: March 04, 2012
  • In this paper, we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes. Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler diver-gence in precision, recall and F1 evaluation. In our experiments, we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation. Character recognition rate was improved from 85.2% to 91%. We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.
  • [1]
    Gorin A L. On automated language acquisition. AcousticalSociety of America Journal, 1995, 97(6): 3441-3461.
    [2]
    Arai K, Wright J H, Riccardi G, Gorin A L. Grammarfragment acquisition using syntactic and semantic clustering.Speech Communication, 1999, 27(1): 43-62.
    [3]
    Meng H M, Siu K C. Semiautomatic acquisition of semanticstructures for understanding domain-specific natural languagequeries. IEEE Trans. Knowl. Data Eng., 2002, 14(1): 172-181.
    [4]
    Pargellis A N, Fosler-Lussier E, Lee C H, Potamianos A, TsaiA. Auto-induced semantic classes. Speech Communication,2004, 43(3): 183-203.
    [5]
    Pangos A, Iosif E, Potamianos A, Fosler-Lussier E. Combin-ing statistical similarity measures for automatic induction ofsemantic classes. In Proc. 2005 IEEE Workshop on Au-tomatic Speech Recognition and Understanding, San Juan,Puerto Rico, Nov. 27-Dec. 1, 2005, pp.278-283.
    [6]
    Iosif E, Tegos A, Pangos A, Fosler-Lussier E, Potamianos A.Unsupervised combination of metrics for semantic class in-duction. In Proc. Spoken Language Technology Workshop,Palm Beach, Aruba, Dec. 10-13, 2006, pp.86-89.
    [7]
    Iosif E, Potamianos A. A soft-clustering algorithm for auto-matic induction of semantic classes. In Proc. Interspeech2007, Antwerp, Belgium, Aug. 27-31, 2007, pp.1609-1612.
    [8]
    Wang C, Chung G, Seneff S. Automatic induction of languagemodel data for a spoken dialogue system. Language Resourcesand Evaluation, 2006, 40(1): 25-46.
    [9]
    Lin D. An information-theoretic definition of similarity. InProc. the 15th International Conference on Machine Learn-ing, Madison, USA, July 24-27, 1998, pp.296-304.
    [10]
    Dagan I, Lee L, Pereira F. Similarity-based models of wordcooccurrence probabilities. Machine Learning, 1999, 34(1-3):43-69.
    [11]
    Weeds J, Weir D, McCarthy D. Characterising measures oflexical distributional similarity. In Proc. the 20th Inter-national Conference on Computer Linguistics, Switzerland,August 23-27, 2004, pp.1015-1021.
    [12]
    Cover T M, Thomas J A. Elements of Information Theory.Wiley-Interscience, 2006, pp.224-238.
    [13]
    Bellegarda J R. Statistical language model adaptation: Re-view and perspectives. Speech Communication, 2004, 42(1):93-108.
    [14]
    Hakkani-Tur D Z, Riccardi G, Tur G. An active approach tospoken language processing. ACM Transactions on Speechand Language Processing, 2006, 3(3): 1-31.
    [15]
    Stolcke A. SRILM | An extensible language modelingtoolkit. In Proc. ICSLP, Denver, USA, September 16-20,2002, pp.901-904.

Catalog

    Article views (0) PDF downloads (1777) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return