Mandarin Pronunciation Modeling Based on CASS Corpus
-
Abstract
The pronunciation variability is an important issue that must be facedwith when developing practical automatic spontaneous speech recognitionsystems. In this paper, the factors that may affect the recognitionperformance are analyzed, including those specific to the Chineselanguage. By studying the INITIAL/FINAL (IF) characteristics of Chineselanguage and developing the Bayesian equation, the conceptsof generalized INITIAL/FINAL (GIF) and generalized syllable (GS), theGIF modeling and the IF-GIF modeling, as well as the context-dependentpronunciation weighting, are proposed based on a well phoneticallytranscribed seed database. By using these methods, the Chinese syllableerror rate (SER) is reduced by 6.3% and 4.2% compared with the GIF modeling and IFmodeling respectively when the language model, such as syllable or wordN-gram, is not used. The effectiveness of these methods is also provedwhen more data without the phonetic transcription are used to refine theacoustic model using the proposed iterative forced-alignment basedtranscribing (IFABT) method, achieving a 5.7% SER reduction.
-
-