合并-权重动态时间规整算法及其语音识别中的应用

张湘莉兰; 骆志刚; 李明

doi:10.1007/s11390-014-1491-0

合并-权重动态时间规整算法及其语音识别中的应用

Merge-Weighted Dynamic Time Warping for Speech Recognition

摘要

摘要: 考虑到时间消耗、存储容量以及成本等因素,获得非英语国家的语音数据训练集是一件较为繁琐和困难的任务.兼顾个人隐私的保护,依赖于语言且轻权重依赖于说话人的语音识别算法为解决上述难题提供了一种可行方案.动态时间规整算法DTW是目前针对有限存储空间且小容量字典的实际应用所设计的,最普遍使用的依赖于说话人的语音识别算法.该算法可应用在移动设备语音拨号、菜单驱动语音识别、机动车辆及机器人语音控制等诸多领域中.然而,传统的动态时间规整算法具有一定的局限性,诸如高计算复杂性、由于引入限制条件引起的粗略近似以及不准确等问题.针对上述弊端,该论文提出了一种新的合并-权重时间规整算法MWDTW.MWDTW算法在动态时间规整算法的核心计算过程中引入了一种模板置信指数来评测合并后的训练和测试语音数据之间的相似性.该算法具有计算过程简单、计算结果高效等特点.通过对三组具有代表性的语音数据进行实验验证,证明MWDTW算法在精度上优于DTW算法、合并DTW算法、以及隐马尔科夫模型HMM,在速度上比DTW算法快6倍.

Abstract: Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, language-independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several limitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall.

HTML全文

参考文献()

施引文献

资源附件()