语音生成和感知系统在大脑中的信息传递 - 观测与模拟 –

摘要: 人体作为一个系统也许是到目前为止已知系统中的最完善的一个。如果我们想建立一个智能系统的话应该首先考虑怎样把人的功能赋予它。这个研究的出发点是探索和学习人的语言语音功能和认知机理并试图把它应用于智能人机界面中去。有关语音合成与语音识别的研究就是想通过计算机来实现人的语音生成和感知功能。但是它们应用的原理及方法和我们人的机理相差甚远，所以遇到许多难以克服的困难。在人的语音生成和感知的研究中，上个世纪初人们就注意到人说话时语音生成和感知之间的相互作用，即有名的 Lombard效应。在此以后，人们通过延迟听觉反馈和变形听觉反馈等方法来探索语音生成和语音感知的原理及其相互之间的联系。虽然已经取得了卓越的成果，但是距离彻底揭示人的有关语音生成和认知机理还非常遥远。本文着眼于研究语音生成和感知的机理以及它在智能系统中的实现，特别是语音生成和感知在大脑中的信息传递和处理方式。 Honda在运用肌电图的生理学研究中根据语音的运动和感知表象的拓扑相似性提出了一个假设，即语音在人脑中的信息传递和处理可能是通过语言生成和语音感知之间高效率的拓扑映射实现的。为了获得切实的证据，我们首先借助于模型模拟来探索元音体系的拓扑结构在运动指令空间，运动学(调音)空间，声学空间的表象，接着通过变形听觉反馈实验来检验元音生成和感知之间的关联。模型模拟显示了在由肌肉激励相关的平衡点组成的坐标系中肌肉激励（运动指令空间）和调音（运动学空间）之间存在一个固定的映射，而且从运动指令空间到运动学空间的映射是唯一的。模型模拟的推理证明元音的拓扑结构在运动指令空间，调音空间，声学空间是相互兼容的。变形听觉反馈实验的结果证实了对于反馈声音中的摄动元音生成系统作出了补偿性的运动。这个结果说明了人们在控制元音生成系统时利用着感知的监测功能。这个研究表明人们会话的时候在生成语音的同时也感知和处理别人和自己语音。完成这么大的计算量如果应用现有的语音参数和信号处理方法是不可想象的。这说明了人在语音生成和感知过程中可能使用了更简洁的参数和更有效的匹配方法。本文根据生理学实验，心理学实验和计算模型模拟的结果提出语音在大脑的语音生产系统和语音感知系统中的参数描述，并试图证实语音（至少是元音）的感知过程是一个简单的拓扑映射。但是要证实本文对人的语音生成和感知系统的阐述还需要许多定量的实验。本文作者希望我们的研究能起到一个抛砖引玉的作用，引起更多的研究者的关注和兴趣。

Abstract: Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain and realizing it in an artificial system. A physiological research study based on electromyographic signals (Honda,1996) suggested that speech communication in human brain might be based on a topological mapping between speech production and perception, according to an analogous topology between motor and sensory representations. Following this hypothesis, this study first investigated the topologies of the vowel system across the motor, kinematic, and acoustic spaces by means of a model simulation, and then examined the linkage between vowel production and perception in terms of a transformed auditory feedback (TAF) experiment. The model simulation indicated that there exists an invariant mapping from muscle activations (motor space) to articulations (kinematic space) via a coordinate consisting of force-dependent equilibrium positions, and the mapping from the motor space to kinematic space is unique. The motor-kinematic-acoustic deduction in the model simulation showed that the topologies were compatible from one space to another. In the TAF experiment, vowel production exhibited a compensatory response for a perturbation in the feedback sound. This implied that vowel production is controlled in reference to perception monitoring.

语音生成和感知系统在大脑中的信息传递 - 观测与模拟 –

Communication Between Speech Production and Perception Within the Brain---Observation and Simulation