›› 2012, Vol. ›› Issue (2): 358-375.doi: 10.1007/s11390-012-1228-x

• Machine Learning and Data Mining • Previous Articles     Next Articles

Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection

Dong Wang1,3, Member, IEEE, Javier Tejedor1,2, Simon King1, Senior Member, IEEE and Joe Frankel1, Member, IEEE   

  1. 1. Centre for Speech Technology Research, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9LW, U.K.;
    2. Human Computer Technology Laboratory (HCTLab), School of Computer Engineering and Telecommunication University Autonomous of Madrid, Avenue Francisco Tomás y Valiente 11, 28049, Madrid, Spain;
    3. Nuance Communications, 1 Wayside Road, Burlington, MA 01803, U.S.A.
  • Received:2011-03-23 Revised:2011-12-01 Online:2012-03-05 Published:2012-03-05

An important component of a spoken term detection (STD) system involves estimating confidence measures of hypothesised detections. A potential problem of the widely used lattice-based confidence estimation, however, is that the confidence scores are treated uniformly for all search terms, regardless of how much they may differ in terms of phonetic or linguistic properties. This problem is particularly evident for out-of-vocabulary (OOV) terms which tend to exhibit high intra-term diversity. To address the impact of term diversity on confidence measures, we propose in this work a term-dependent normalisation technique which compensates for term diversity in confidence estimation. We first derive an evaluation-metric-oriented normalisation that optimises the evaluation metric by compensating for the diverse occurrence rates among terms, and then propose a linear bias compensation and a discriminative compensation to deal with the bias problem that is inherent in lattice-based confidence measurement and from which the Term Specific Threshold (TST) approach suffers. We tested the proposed technique on speech data from the multi-party meeting domain with two state-of-the-art STD systems based on phonemes and words respectively. The experimental results demonstrate that the confidence normalisation approach leads to a significant performance improvement in STD, particularly for OOV terms with phoneme-based systems.

[1] Mamou J, Ramabhadran B, Siohan O. Vocabulary indepen-dent spoken term detection. In Proc. the 30th ACM-SIGIR,Amsterdam, the Netherlands, July 23-27, 2007, pp.615-622.

[2] Mamou J, Ramabhadran B. Phonetic query expansionfor spoken document retrieval. In Proc. the 9th IN-TERSPEECH, Brisbane, Australia, September 22-26, 2008,pp.2106-2109.

[3] Can D, Cooper E, Sethy A, White C, Ramabhadran B,Saraclar M. Effect of pronunciations on OOV queries in spo-ken term detection. In Proc. ICASSP 2009, Taipei, China,April 19-24, 2009, pp.3957-3960.

[4] Fiscus J G, Ajot J, Garofolo J S, Doddingtion G. Resultsof the 2006 spoken term detection evaluation. In Proc.Workshop on Searching Spontaneous Conversational Speech(SIGIR-SSCS), Amsterdam, the Netherlands, July 2007,pp.45-50.

[5] Vergyri D, Stolcke A, Gadde R R, Wang W. The SRI 2006spoken term detection system. In Proc. NIST Spoken TermDetection Workshop (STD 2006), Gaithersburg, USA, De-cember 14-15, 2006.

[6] Vergyri D, Shafran I, Stolcke A, Gadde R R, Akbacak M,Roark B, Wang W. The SRI/OGI 2006 spoken term detec-tion system. In Proc. the 8th INTERSPEECH, Antwerp,Belgium, August 27-31, 2007, pp.2393-2396.

[7] Akbacak M, Vergyri D, Stolcke A. Open-vocabulary spokenterm detection using graphone-based hybrid recognition sys-tems. In Proc. ICASSP 2008, Las Vegas, USA, March 31-April 4, 2008, pp.5240-5243.

[8] Szöke I, Fapso M, Karafiát M, Burget L, Grézl F, SchwarzP, Glembek O, Matejka P, Kopecky J, Cernocky J. Spo-ken term detection system based on combination of LVCSRand phonetic search. In Lecture Notes in Computer Science4892, Popescn-Belis A, Bourlard H, Reanals S (eds.), SpringerBerlin/Heidelberg, September 2008, pp.237-247.

[9] Szöke I, Burget L, Cernocky J, Fapso M. Sub-word modelingof out of vocabulary words in spoken term detection. In Proc.IEEE Workshop on Spoken Language Technology (SLT2008),Goa, India, December 15-19, 2008, pp.273-276.

[10] Szöke I, Fapso M, Burget L, Cernocky J. Hybrid word-subword decoding for spoken term detection. In Proc. SpeechSearch Workshop at SIGIR (SSCS 2008), Singapore, Singa-pore, July 20-24, 2008, pp.42-48.

[11] Meng S, Yu P, Liu J, Seide F. Fusing multiple systems intoa compact lattice index for Chinese spoken term detection.In Proc. ICASSP 2008, Las Vegas, USA, March 31-April 4,2008, pp.4345-4348.

[12] Thambiratmann K, Sridharan S. Rapid yet accurate speechindexing using dynamic match lattice spotting. IEEE Trans-actions on Audio, Speech, and Language Processing, 2007,15(1): 346-357.

[13] Wallace R, Vogt R, Baker B, Sridharan S. Optimising fig-ure of merit for phonetic spoken term detection. In Proc.ICASSP 2010, Dallas, USA, March 14-19, 2010, pp.5298-5301.

[14] Parada C, Sethy A, Dredze M, Jelinek F. A spoken term de-tection framework for recovering out-of-vocabulary words us-ing the web. In Proc. Interspeech 2010, Makuhari, Japan,September 26-30, 2010, pp.1269-1272.

[15] Jansen A, Church K, Hermansky H. Towards spoken termdiscovery at scale with zero resources. In Proc. INTER-SPEECH 2010, Makuhari, Japan, September 26-30, 2010,pp.1676-1679.

[16] Parada C, Sethy A, Ramabhadran B. Balancing false alarmsand hits in spoken term detection. In Proc. ICASSP 2010,Dallas, USA, March 14-19, 2010, pp.5286-5289.

[17] Schneider D, Mertens T, Larson M, Kohler J. Contextual veri-fication for open vocabulary spoken term detection. In Proc.INTERSPEECH 2010, Makuhari, Japan, September 26-30,2010, pp.697-700.

[18] Chan C A, Lee L S. Unsupervised spoken-term detection withspoken queries using segment-based dynamic time warping.In Proc. INTERSPEECH 2010, Makuhari, Japan, Septem-ber 26-30, 2010, pp.693-696.

[19] Chen C P, Lee H Y, Yeh C F, Lee L S. Improved spokenterm detection by feature space pseudo-relevance feedback. In Proc. INTERSPEECH 2010, Makuhari, Japan, Septem-ber 26-30, 2010, pp.1672-1675.

[20] Motlicek P, Valente F, Garner P. English spoken termdetection in multilingual recordings. In Proc. INTER-SPEECH 2010, Makuhari, Japan, September 26-30, 2010,pp.206-209.

[21] Szöke I, Fapso M, Karafiát M, Burget L, Grézl F, Schwarz P,Glembek O, Matejka P, Kontár S, Cernocky J. BUT systemfor NIST STD 2006 | English. In Proc. NIST Spoken TermDetection Evaluation Workshop (STD 2006), Gaithersburg,USA, December 14-15, 2006.

[22] Miller D R H, Kleber M, Kao C L, Kimball O, Colthurst T,Lowe S A, Schwartz R M, Gish H. Rapid and accurate spokenterm detection. In Proc. INTERSPEECH 2007, Antwerp,Belgium, August 27-31, 2007, pp.314-317.

[23] Seide F, Yu P, Ma C, Chang E. Vocabulary-independentsearch in spontaneous speech. In Proc. ICASSP 2004, Vol.1,Montreal, Quebec, Canada, May 17-21, 2004, pp.253-256.

[24] Logan B, Thong J M V, Moreno P J. Approaches to reducethe effects of OOV queries on indexed spoken audio. IEEETransaction on Multimedia, 2005, 7(5): 899-906.

[25] Logan B, Moreno P, Deshmuk O. Word and sub-word index-ing approaches for reducing the effects of OOV queries onspoken audio. In Proc. the 2rd HLT, San Diego, USA, March24-27, 2002, pp.31-35.

[26] Ma B, Li H. A phonotactic-semantic paradigm for automaticspoken document classification. In Proc. the 28th Interna-tional ACM SIGIR Conference on Research and Develop-ment in Information retrieval, Salvador, Brazil, August 15-19, 2005, pp.369-376.

[27] Pinto J, Szöke I, Prasanna S, Hermansky H. Fast approximatespoken term detection from sequence of phonemes. In Proc.the 31st Annual International ACM SIGIR Conference, Sin-gapore, Singapore, July 20-24, 2008, pp.28-33.

[28] Meng S, Yu P, Seide F, Liu J. A study of lattice-based spo-ken term detection for Chinese spontaneous speech. In Proc.ASRU2007, Kyoto, Japan, December 9-13, 2007, pp.635-640.

[29] Wang D, Frankel J, Tejedor J, King S. A comparison ofphone and grapheme-based spoken term detection. In Proc.ICASSP 2008, Las Vegas, USA, March 31-April 4, 2008,pp.4969-4972.

[30] Wallace R, Vogt R, Sridharan S. A phonetic search approachto the 2006 NIST spoken term detection evaluation. InProc. IINTERSPEECH 2007, Antwerp, Belgium, August 27-31, 2007, pp.2385-2388.

[31] Parlak S, Sara~clar M. Spoken term detection for Turkishbroadcast news. In Proc. ICASSP 2008, Las Vegas, USA,March 31-April 4, 2008, pp.5244-5247.

[32] James D A. A system for unrestricted topic retrieval from ra-dio news broadcasts. In Proc. ICASSP 1996, Vol.1, Atlanta,USA, May 7-10, 1994, pp.279-282.

[33] Jones G J F, Foote J T, Sp?arck Jones K S, Young S J. Retriev-ing spoken documents by combining multiple index sources.In Proc. the 19th ACM SIGIR, Zurich, Switzerland, August18-22, 1996, pp.30-38.

[34] Saraclar M, Sproat R. Lattice-based search for spoken utte-rance retrieval. In Proc. HLT-NAACL 2004, Boston, USA,May 2-7, 2004, pp.129-136.

[35] Iwata K, Shinoda K, Furui S. Robust spoken term detectionusing combination of phone-based and word-based recogni-tion. In Proc. INTERSPEECH 2008, Brisbane, Australia,September 22-26, 2008, pp.2195-2198.

[36] Yu P, Seide F. A hybrid word/phoneme-based approachfor improved vocabulary-independent search in spontaneousspeech. In Proc. ICSLP 2004, Jeju, Korea, October 4-8, 2004,pp.293-296.

[37] Yazgan A, Saraclar M. Hybrid language models for out ofvocabulary word detection in large vocabulary conversationalspeech recognition. In Proc. ICASSP 2004, Vol.1, Montreal,Canada, May 17-21, 2004, pp.745-748.

[38] NIST. The spoken term detection (STD) 2006 evaluationplan. National Institute of Standards and Technology(NIST), Gaithersburg, USA, 10 edition, September 2006,http://www.nist.gov/speech/tests/std.

[39] Martin A, Doddington G, Kamm T, Ordowski M, PrzybockiM. The DET curve in assessment of detection task perfor-mance. In Proc. Eurospeech1997, Vol.4, Rhodes, Greece,September 22-25, 1997, pp.1895-1898.

[40] Wessel F, Macherey K, Schl?uter R. Using word probabilitiesas confidence measures. In Proc. ICASSP 1998, Vol.1, Seat-tle, Washington, USA, May 12-15, 1998, pp.225-228.

[41] Rohlicek J R, Russell W, Roukos S, Gish H. Continuoushidden Markov modeling for speaker-independent word spot-ting. In Proc. ICASSP 1989, Glasgow, UK, May 23-26, 1989,pp.627-630.

[42] Cox S, Rose R. Confidence measures for the SWITCHBOARDdatabase. In Proc. ICASSP 1996, Vol.1, Atlanta, USA, May7-10, 1996, pp.511-514.

[43] Weintraub M. LVCSR log-likelihood ratio scoring for keywordspotting. In Proc. ICASSP 1995, Vol.1, Detroit, USA, May9-12, 1995, pp.297-300.

[44] Setlur A R, Sukkar R A, Jacob J. Correcting recognition er-rors via discriminative utterance verification. In Proc. IC-SLP 1996, Philadelphia, USA, October 1996, pp.602-605.

[45] James D A, Young S J. A fast lattice-based approach to vo-cabulary independent wordspotting. In Proc. ICASSP 1994,Vol.1, Adelaide, Australia, April 19-22, 1994, pp.377-380.

[46] Kemp T, Schaaf T. Estimating confidence using word lattices.In Proc. EUROSPEECH1997, Rhodes, Greece, September22-25, 1997, pp.827-830.

[47] Rahim M G, Lee C H, Juang B H. Discriminative utteranceverification for connected digits recognition. IEEE Transac-tions on Speech and Audio Processing, 1997, 5(3): 266-277.

[48] Sukkar R A. Subword-based minimum verification error (SB-MVE) training for task independent utterance verification. InProc. ICASSP 1998, Vol.1, Seattle, USA, May 12-15, 1998,pp.229-232

[49] Gillick L, Ito Y, Young J. A probabilistic approach to con-fidence estimation and evaluation. In Proc. ICASSP 1997,Munich, Germany, April 21-24, 1997, pp.879-882.

[50] Siu M, Gish H, Richardson F. Improved estimation, eval-uation and applications of confidence measures for speechrecognition. In Proc. EUROSPEECH1997, Rhodes, Greece,September 22-25, 1997, pp.831-834.

[51] Chase L. Word and acoustic confidence annotation for largevocabulary speech recognition. In Proc. EUROSPEECH1997, Rhodes, Greece, September 22-25, 1997, pp.815-818.

[52] Hauptmann A G, Jones R E, Seymore K, Slattery S T, Wit-brock M J, Siegler M A. Experiments in information re-trieval from spoken documents. In Proc. DARPA Workshopon Broadcast News Transcription and Understanding, Lans-downe, USA, February 8-11, 1998, pp.175-181.

[53] Kamppari S O, Hazen T J. Word and phone level acousticconfidence scoring. In Proc. ICASSP 2000, Vol.3, Istanbul,Turkey, June 5-9, 2000, pp.1799-1802.

[54] ábrego G A H. Confidence measures for speech recogni-tion and utterance verification [PhD thesis]. Polytechnic ofCatalu~na, March 2000.

[55] Zhang R, Rudnicky A I. Word level confidence annotation us-ing combinations of features. In Proc. EUROSPEECH2001,Aalborg, Denmark, September 3-7, 2001, pp.2105-2108.

[56] Sudoh K, Tsukada H, Isozaki H. Discriminative named en-tity recognition of speech data using speech recognition con-fidence. In Proc. ICSLP 2006, Pittsburgh, USA, September17-21, 2006, pp.1153-1156.

[57] Shafran Z, Roark B, Fisher S. OGI spoken term detection sys-tem. In Proc. NIST Spoken Term Detection Workshop (STD2006), Gaithersburg, USA, December 14-15, 2006, pp.1-15.

[58] Jiang H. Confidence measures for speech recognition: A sur-vey. Speech Communication 2005, 45(4): 455-470.

[59] Siu M, Gish H. Evaluation of word confidence for speechrecognition systems. Computer Speech and Language, 1999,13(4): 299-319.

[60] Mathan L, Miclet L. Rejection of extraneous input in speechrecognition applications, using multi-layer perceptrons andthe trace of HMMs. In Proc. ICASSP 1991, Vol.1, Toronto,Canada, April 14-17, 1991, pp.93-96.

[61] Neti C V, Roukos S, Eide E. Word-based confidence mea-sures as a guide for stack search in speech recognition. InProc. ICASSP 1997, Munich, Germany, April 21-24, 1997,pp.883-886.

[62] Bishop C M. Neural Networks for Pattern Recognition. Ox-ford University Press, 1995.

[63] Wang D, King S, Frankel J. Stochastic pronunciation model-ing for out-of-vocabulary spoken term detection. IEEE Trans.Audio, Speech, and Language Processing, 2011, 19(4): 688-698.

[64] Hain T, Burget L, Dines J, Garau G, Karafiat M, LincolnM, Vepa J, Wan V. The AMI meeting transcription system:Progress and performance. In Lecture Notes in Computer Sci-ence 4299, Renals S et al. (eds.), Springer Berlin/Heidelberg,2006, pp.419-431.

[65] Deligne S, Yvon F, Bimbot F. Variable-length sequencematching for phonetic transcription using joint multigrams.In Proc. EUROSPEECH1995, Madrid, Spain, September 18-21, 1995, pp.2243-2246.

[66] Chang C C, Lin C J. LIBSVM: A library for support vectormachines. http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

[67] Liaw A, Wiener M. Classification and regression by randomforest. R News, 2002, 2(3): 18-22.

[68] Can D, Sara~clar M. Score distribution based term specificthresholding for spoken term detection. In Proc. NAACLHLT 2009, Boulder, USA, May 31-June 5, 2009, pp.269-272.
No related articles found!
Full text



[1] Qiao Xiangzhen;. An Efficient Parallel Algorithm for FFT[J]. , 1987, 2(3): 174 -190 .
[2] Zhou Chaochen; Liu Xinxin;. Denote CSP with Temporal Formulas[J]. , 1990, 5(1): 17 -23 .
[3] Ma Jun; Ma Shaohan;. An O(k~2n~2) Algorithm to Find a k-Partition in a k-Connected Graph[J]. , 1994, 9(1): 86 -91 .
[4] wang Xuejun; Shi Chunyi;. A Multiagent Dynamic interaction Testbed:Theoretic Framework, System Architecture and Experimentation[J]. , 1997, 12(2): 121 -132 .
[5] QI Yuesheng; WANG Baozhong; KANG Lishan;. Genetic Programming with Simple Loops[J]. , 1999, 14(4): 429 -433 .
[6] PENG wei; LU Xicheng;. An Approach to Support IP Multicasting in Networks with Mobile Hosts[J]. , 1999, 14(6): 529 -538 .
[7] Xu-Bin Deng, and Yang-Yong Zhu. L-tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises[J]. , 2005, 20(6): 763 -773 .
[8] Xiao-Qing Zheng, Hua-Jun Chen, Zhao-Hui Wu, and Yu-Xin Mao. Dynamic Query Optimization Approach for Semantic Database Grid[J]. , 2006, 21(4): 597 -608 .
[9] Zhi-Hua Zhou. Multi-Instance Learning from Supervised View[J]. , 2006, 21(5): 800 -809 .
[10] Jian-Ping Wu and Ke Xu. Research on Next-Generation Internet Architecture[J]. , 2006, 21(5): 723 -731 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved