Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception

Jia Jia; Wai-Kim Leung; Yu-Hao Wu; Xiu-Long Zhang; Hao Wang; Lian-Hong Cai; Helen M. Meng

doi:10.1007/s11390-014-1465-2

Volume 29 Issue 5

September 2014

Turn off MathJax

Article Contents

Abstract

References

Journal of Computer Science and Technology > 2014 > 29(5): 751-761. > DOI: 10.1007/s11390-014-1465-2 CSTR: 32374.14.s11390-014-1465-2

Jia Jia, Wai-Kim Leung, Yu-Hao Wu, Xiu-Long Zhang, Hao Wang, Lian-Hong Cai, Helen M. Meng. Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception[J]. Journal of Computer Science and Technology, 2014, 29(5): 751-761. DOI: 10.1007/s11390-014-1465-2

Citation:

Previous Article Next Article

PDF

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception

Jia Jia^1,2,3 (贾珈) Member, CCF, ACM, IEEE,
Wai-Kim Leung^1,4,5 (梁伟俭) ,
Yu-Hao Wu^1,2,3 (吴育昊) ,
Xiu-Long Zhang^1,2,3 (张秀龙) ,
Hao Wang^4,5 (王昊) ,
Lian-Hong Cai^1,2,3 (蔡莲红) ,
Helen M. Meng^4,5 (蒙美玲) Fellow, IEEE

1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
2. Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;
3. Key Laboratory of Pervasive Computing, Ministry of Education, Beijing 100084, China;
4. Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong, Shatin, Hong Kong, China;
5. Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen 518055, China

Funds: This work is supported by the National Basic Research 973 Program of China under Grant No. 2013CB329304, the National Natural Science Foundation of China under Grant No. 61370023, and the Major Project of the National Social Science Foundation of China under Grant No. 13&ZD189. This work is also partially supported by the General Research Fund of the Hong Kong SAR Government under Project No. 415511 and the CUHK Teaching Development Grant.

More Information

Author Bio:
Jia Jia is an associate professor in the Department of Computer Science and Technology, Tsinghua University, Beijing. She got her B.S. and Ph.D. degrees both in computer science and technology from Tsinghua University in 2003 and 2008 respectively. Her main research interest is human computer speech interaction and social a?ective computing.
Received Date: March 15, 2014
Revised Date: July 13, 2014
Published Date: September 04, 2014

Abstract

Abstract

Computer-aided pronunciation training (CAPT) technologies enable the use of automatic speech recognition to detect mispronunciations in second language (L2) learners' speech. In order to further facilitate learning, we aim to develop a principle-based method for generating a gradation of the severity of mispronunciations. This paper presents an approach towards gradation that is motivated by auditory perception. We have developed a computational method for generating a perceptual distance (PD) between two spoken phonemes. This is used to compute the auditory confusion of native language(L1). PD is found to correlate well with the mispronunciations detected in CAPT system for Chinese learners of English, i.e., L1 being Chinese (Mandarin and Cantonese) and L2 being US English. The results show that auditory confusion is indicative of pronunciation confusions in L2 learning. PD can also be used to help us grade the severity of errors (i.e., mispronunciations that confuse more distant phonemes are more severe) and accordingly prioritize the order of corrective feedback generated for the learners.
- second language learning,
- computer-aided pronunciation training,
- mispronunciation,
- computational speech perception

FullText(HTML)

References (12)

References

[1]	Braj K. Asian Englishes: Beyond the Canon. Hong Kong: Hong Kong University Press, 2005.
[2]	Harrison A M, Lau W Y, Meng H, Wang L. Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer. In Proc. the 9th Annual Conference of the International Speech Communication Association, Sept. 2008, pp.2787-2790.
[3]	Meng H, Lo Y, Wang L, Lau W Y. Deriving salient learners' mispronunciations from cross-language phonological comparisons. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, December 2007, pp.437-442.
[4]	Lo W K, Harrison A M, Meng H, Wang L. Decision fusion for improving mispronunciation detection using language transfer knowledge and phoneme-dependent pronunciation scoring. In Proc. the 6th International Symposium on Chinese Spoken Language Processing, December 2008, pp.25-28.
[5]	Yuen K W, Leung W K, Liu P F, Wong K H, Qian X, Lo W K, Meng H. Enunciate: An internet-accessible computer-aided pronunciation training system and related user evaluations. In Proc. International Conference on Speech Databases and Assessment, October 2011, pp.85-90.
[6]	Laver J. Principles of Phonetics. Cambridge, UK: Cambridge University Press, 1994.
[7]	Ellis R. Corrective feedback and teacher development. L2 Journal, 2009, 1: 3-18.
[8]	Wang H, Qian X, Meng H. Phonological modeling of mispronunciation gradations in L2 English speech of L1 Chinese learners. In Proc. International Conference on Acoustics, Speech, and Signal Processing, May 2014.
[9]	Huang G, Jia J, Cai L. A study on perception measurement of mandarin vowels based on LPC spectrum features. In Proc. Phonetic Conference, May 2010.
[10]	Jia J, Wang Y, Zhang Y, Tian Y, Cai L. Discussion on perception definition computing method of mandarin consonants. In Proc. Phonetic Conference, May 2012.
[11]	Meng H, Zee E, Lee W S. A contrastive phonetic study between Cantonese and English to predict salient mispronunciations by Cantonese learners of English. Technical Report, SEEM2007-1500, Department of Systems Engineering and Jia Jia et al.: Grading the Severity of Mispronunciations in CAPT 761 Engineering Management, the Chinese University of Hong Kong, February 2007.
[12]	Neri A, Cucchiarini C, Strik H, Boves L. The pedagogytechnology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 2002, 15(5): 441-467.

Relative Articles

[1]	Zhen-Xing Zhang, Yuan-Bo Wen, Han-Qi Lyu, Chang Liu, Rui Zhang, Xia-Qing Li, Chao Wang, Zi-Dong Du, Qi Guo, Ling Li, Xue-Hai Zhou, Yun-Ji Chen. AI Computing Systems for Large Language Models Training[J]. Journal of Computer Science and Technology, 2025, 40(1): 6-41. DOI: 10.1007/s11390-024-4178-1
[2]	Chang-Le Zhou, Yun Yang, Xiao-Xi Huang. Computational Mechanisms for Metaphor in Languages: A Survey[J]. Journal of Computer Science and Technology, 2007, 22(2): 308-319.
[3]	Jianwu Dang, Masato Akagi, Kiyoshi Honda. Communication Between Speech Production and Perception Within the Brain---Observation and Simulation[J]. Journal of Computer Science and Technology, 2006, 21(1): 95-05.
[4]	ZHENG Fang, WU Jian, SONG Zhanjiang. Improving the Syllable-Synchronous Network Search Algorithm for Word Decoding in Continuous Chinese Speech Recognition[J]. Journal of Computer Science and Technology, 2000, 15(5): 461-471.
[5]	ZHENG Fang, XU Mingxing, MOU Xiaolong, WU Jian, WU Wenhu, FANG Ditang. HarkMan—A Vocabulary-Independent Keyword Spotter for Spontaneons Chinese Speech[J]. Journal of Computer Science and Technology, 1999, 14(1): 18-26.
[6]	Chen Fang, Yuan Baozong. An Approach to Intelligent Speech Production System[J]. Journal of Computer Science and Technology, 1997, 12(2): 185-188.
[7]	Zheng Fang, Wu Wenhu, Fang Ditang. A Log-Index Weighted Cepstral Distance Measure for Speech Recognition[J]. Journal of Computer Science and Technology, 1997, 12(2): 177-184.
[8]	HUANG Jun, ZHU Tao, Jeremiah F.HAYES. An Efficient Computational Method for Solving Nonlinear Matrix Equation and Its Application in Queuing Analysis[J]. Journal of Computer Science and Technology, 1996, 11(3): 272-285.
[9]	Sun Yudong, Xie Zhiliang. Macro-Dataflow Computational Model and Its Simulation[J]. Journal of Computer Science and Technology, 1990, 5(3): 289-295.
[10]	Huang Xuedong, Cai Lianhong, Fang Ditang, Chi Bianjin, Zhou Li, Jiang Li. A Computer System for Chinese Character Speech Input[J]. Journal of Computer Science and Technology, 1986, 1(4): 75-83.

Supplements (0)

Cited By

Get Citation

PDF

XML

Article views (36) PDF downloads (1444)

Indexed in:

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception

Abstract

References

Related Articles

Catalog

Related

Home

Overview

Resources

Contents

Indexed in:

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception

Abstract

References

Related Articles

Catalog

Related

Home

Overview

Resources

Contents

Export File

Citation

Format

Content