We use cookies to improve your experience with our site.

方言背景普通话语音识别框架

A Dialectal Chinese Speech Recognition Framework

  • 摘要: 当前,汉语标准普通话的语音识别已经达到基本可以实用的水平。但在实际中,多数人所说的普通话因受其方言背景的影响而不十分标准,这大大影响了语音识别的性能。一种解决方案是,对每种方言都收集足够多的语音数据然后构造相应的识别器,但由于汉语方言种类多且差异大,时间和成本都是很高的。针对此问题,本文提出并研究一种框架,在该框架下,仅需少量方言背景普通话的数据,利用方言相关知识,即可将标准普通话识别器转换为方言背景普通话识别器。 方言相关的知识源有两种,一是专家,一是小规模的方言背景普通话数据库。知识可在语音、词典、语言和解码四个级别上加以提取和应用。作为实验用例,本文以吴方言背景普通话为目标语言。根据汉语声韵结构的特点,以及带口音的说话人的说话方式,本文提出了基于声韵结构的知识,包括上下文无关标准声韵映射规则,上下文无关吴方言声韵映射规则,音节相关吴方言声韵映射规则等。这些知识从数据和专家知识中获取,并与base-form 和surface-form指导下的MLLR声学自适应方法相结合。同时,为了降低多发音词典的长度,减少混淆度,提高解码速度,从而提高系统性能,本文提出了一种基于uni-gram累积概率的多发音扩展方法来产生多发音词典的策略。一些普通话中常用的吴方言词汇也被选择出来,加入到发音词典中。利用上述方法得到的吴方言背景普通话识别器,与原有的标准普通话识别器相比,在识别吴方言背景普通话时汉字错误率绝对值降低了10-18%,而在识别标准普通话时汉字错误率的绝对值仅增加0.62%。 虽然此研究是基于吴方言的,但此框架是一种通用的框架,可以方便地应用于其他方言背景普通话。

     

    Abstract: A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizerinto a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) asan example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use context-independent PTH-IF mappings (where IF means either a ChineseInitial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, theresulting WDC speech recognizer achieves 10--18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.

     

/

返回文章
返回