We use cookies to improve your experience with our site.

计算机听觉的音频增强——基于样本重要性的迭代训练范式

Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

  • 摘要: 用于音频任务的神经网络模型,如自动语音识别和声学场景分类,在实际应用中很容易受到噪声污染的影响。为提高音频质量,可在目标音频应用程序的前端运用独立开发的增强模块。本文提出一种端对端的学习解决方案,以联合优化音频增强和后续应用模块。为优化音频增强模块以实现目标应用(尤其在处理困难样本上),我们采用样本性能指标度量样本重要性。在实验中,我们考虑了四个具有代表性的应用来评估我们的训练模式,即自动语音识别、语音命令识别、语音情感识别和声学场景分类。这些应用与涉及语义和非语义特征、瞬时和全局信息的语音及非语音任务有关。实验结果表明,我们提出的方法可以大大提高模型的噪声鲁棒性,特别是在低信噪比的情况下,适用于日常生活噪声环境中的各种计算机听觉任务。

     

    Abstract: Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios, for a wide range of computer audition tasks in everyday-life noisy environments.

     

/

返回文章
返回