Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

Manuel Milling; Shuo Liu; Andreas Triantafyllopoulos; Ilhan Aslan; Björn W. Schuller

doi:10.1007/s11390-024-2934-x

Volume 39 Issue 4

September 2024

Turn off MathJax

Article Contents

Abstract

Conflict of Interest

References

Supplements

Journal of Computer Science and Technology > 2024 > 39(4): 895-911. > DOI: 10.1007/s11390-024-2934-x CSTR: 32374.14.s11390-024-2934-x

Milling M, Liu S, Triantafyllopoulos A et al. Audio enhancement for computer audition—An iterative training paradigm using sample importance. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(4): 895−911 July 2024. DOI: 10.1007/s11390-024-2934-x.

Citation:

Previous Article Next Article

PDF

Read Online

Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

Manuel Milling^{1, 2, 3,} ,
Shuo Liu^1, (刘硕) ,
Andreas Triantafyllopoulos^{1, 2, 3,} ,
Ilhan Aslan^4, ,
Björn W. Schuller^{1, 2, 3, 5, 6,} Fellow, ACM, IEEE

1.
Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg 86159, Germany
2.
Chair of Health Informatics, München rechts der Isar, Technical University of Munich, Munich 81675, Germany
3.
Munich Center for Machine Learning, Munich 80333, Germany
4.
Huawei Technologies, Munich, Munich 80992, Germany
5.
Munich Data Science Institute, Garching 85748, Germany
6.
Group on Language, Audio and Music, Imperial College London, London SW7 2AZ, U.K.

Funds: This research was partly supported by the Affective Computing & HCI Innovation Research Lab between Huawei Technologies and the University of Augsburg, and the EU H2020 Project under Grant No. 101135556 (INDUX-R).

More Information

Author Bio:
Manuel Milling received his Bachelor of Science in physics and in computer science from the University of Augsburg, Augsburg, in 2014 and 2015, respectively, and his Master of Science in physics from the same university in 2018. He is currently a Ph.D. candidate in computer science at the chair of Health Informatics, Technical University of Munich, Munich. His research interests include machine learning with a particular focus on the core understandings of and applications of deep learning methodologies

Shuo Liu received his Bachelor degree from the Nanjing University of Posts and Telecommunications, Nanjing, in 2012, and his M.Sc. degree from the Technical University of Darmstadt, Darmstadt, in 2017. He worked as a researcher in the Sivantos group for hearing aids solutions. He is currently a Ph.D. candidate at the Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg. His research focuses are deep learning for audio processing, mobile computing, digital health, and affective computing

Andreas Triantafyllopoulos received his diploma in ECE from the University of Patras, Patras, in 2017. He is working toward his doctoral degree with the Chair of Health Informatics, Technical University of Munich, Munich. His current focus is on deep learning methods for auditory intelligence and affective computing

Ilhan Aslan received his diploma in 2004 from Saarland University, Saarland, and his doctoral degree in 2014 at the Center for HCI from Paris-Lodron University Salzburg in Austria. He was an akad. Rat (assistant professor) at University of Augsburg, Augsburg, from 2016 onward before joining Huawei Technologies, Munich, in 2020 as an HCI Expert where he leads an HCI research team and managed the Affective Computing & HCI Innovation Research Lab. His research focus is at the intersection of HCI, IxD, and affective computing, exploring the future of human-centred multimedia and multimodal interaction

Björn W. Schuller: Björn W Schuller received his diploma in 1999, his doctoral degree in 2006, and his habilitation and was entitled Adjunct Teaching Professor in 2012 all in electrical engineering and information technology from Technical University of Munich in Munich. He is full professor of artificial intelligence and the Head of GLAM at Imperial College London, Chair of the Chair for Health Informatics, MRI, Technical University of Munich, Munich, amongst other Professorships and Affiliations. He is a fellow of IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the ACM, Fellow and President-Emeritus of the AAAC, Fellow of the BCS, Fellow of the ELLIS, Fellow of the ISCA, and Elected Full Member Sigma Xi. He (co-)authored 1400+ publications (60000+ citations, h-index=110)
Received Date: October 25, 2022
Accepted Date: June 29, 2024

Abstract

Abstract

Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios, for a wide range of computer audition tasks in everyday-life noisy environments.
- audio enhancement,
- computer audition,
- joint optimisation,
- multi-task learning,
- voice suppression

FullText(HTML)

References (64)

References

[1]	De Andrade D C, Leo S, Da Silva Viana M L, Bernkopf C. A neural attention model for speech command recognition. arXiv: 1808.08929, 2018. https://arxiv.org/abs/1808.08929, Jul. 2024.
[2]	Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020, pp.12449–12460.
[3]	Wagner J, Triantafyllopoulos A, Wierstorf H et al. Dawn of the transformer era in speech emotion recognition: Closing the valence gap. IEEE Trans. Pattern Analysis and Machine Intelligence, 2023, 45(9): 10745–10759. DOI: 10.1109/TPAMI.2023.3263585.
[4]	Ren Z, Kong Q, Han J, Plumbley M D, Schuller B W. Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes. In Proc. the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2019, pp.56–60. DOI: 10.1109/ICASSP.2019.8683434.
[5]	Liu S, Keren G, Parada-Cabaleiro E, Schuller B. N-HANS: A neural network-based toolkit for in-the-wild audio enhancement. Multimedia Tools and Applications, 2021, 80(18): 28365–28389. DOI: 10.1007/s11042-021-11080-y.
[6]	Spille C, Kollmeier B, Meyer B T. Comparing human and automatic speech recognition in simple and complex acoustic scenes. Computer Speech & Language, 2018, 52: 123–140. DOI: 10.1016/j.csl.2018.04.003.
[7]	Triantafyllopoulos A, Keren G, Wagner J et al. Towards robust speech emotion recognition using deep residual networks for speech enhancement. In Proc. the 20th Annual Conf. International Speech Communication Association, Sept. 2019, pp.1691–1695.
[8]	Liu S, Triantafyllopoulos A, Ren Z et al. Towards speech robustness for acoustic scene classification. In Proc. the 21st Annual Conference of the International Speech Communication Association, Oct. 2020, pp.3087–3091.
[9]	Park D S, Chan W, Zhang Y, Chiu C C, Zoph B, Cubuk E D, Le Q V. SpecAugment: A simple data augmentation method for automatic speech recognition. In Proc. the 20th Annual Conference of the International Speech Communication Association, Sept. 2019, pp.2613–2617.
[10]	Weninger F, Erdogan H, Watanabe S et al. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In Proc. the 12th Int. Conf. Latent Variable Analysis and Signal Separation, Aug. 2015, pp.91–99. DOI: 10.1007/978-3-319-22482-4_11.
[11]	Kinoshita K, Ochiai T, Delcroix M, Nakatani T. Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. In Proc. the 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2020, pp.7009–7013. DOI: 10.1109/ICASSP40776.2020.9053266.
[12]	Sivasankaran S, Nugraha A A, Vincent E, Morales-Cordovilla J A, Dalmia S, Illina I, Liutkus A. Robust ASR using neural network based speech enhancement and feature simulation. In Proc. the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec. 2015, pp.482–489. DOI: 10.1109/ASRU.2015.7404834.
[13]	Zorilă C, Boeddeker C, Doddipatla R, Haeb-Umbach R. An investigation into the effectiveness of enhancement in ASR training and test for chime-5 dinner party transcription. In Proc. the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2019, pp.47–53. DOI: 10.1109/ASRU46091.2019.9003785.
[14]	Iwamoto K, Ochiai T, Delcroix M et al. How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR. In Proc. the 23rd Annual Conference of the International Speech Communication Association, Sept. 2022, pp.5418–5422.
[15]	Wang Z Q, Wang D L. A joint training framework for robust automatic speech recognition. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2016, 24(4): 796–806. DOI: 10.1109/TASLP.2016.2528171.
[16]	Narayanan A, Misra A, Chin K K. Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR. In Proc. the 16th Annual Conference of the International Speech Communication Association, Sept. 2015, pp.3571–3575.
[17]	Ma D, Hou N N, Pham V T et al. Multitask-based joint learning approach to robust ASR for radio communication speech. In Proc. the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2021, pp.497–502.
[18]	Chen Z, Watanabe S, Erdogan H et al. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. In Proc. the 16th Annual Conference of the International Speech Communication Association, Sept. 2015, pp.3274–3278.
[19]	Liu B, Nie S, Liang S, Liu W J, Yu M, Chen L W, Peng S Y, Li C L. Jointly adversarial enhancement training for robust end-to-end speech recognition. In Proc. the 20th Annual Conference of the International Speech Communication Association, Sept. 2019, pp.491–495.
[20]	Li L J, Kang Y K, Shi Y C, Kürzinger L, Watzel T, Rigoll G. Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021, 2021(1): 26. DOI: 10.1186/S13636-021-00215-6.
[21]	Zhu Q S, Zhang J, Zhang Z Q, Dai L R. Joint training of speech enhancement and self-supervised model for noise-robust ASR. arXiv: 2205.13293, 2022. https://arxiv.org/abs/2205.13293, Jul. 2024.
[22]	Kim C, Garg A, Gowda D, Mun S, Han C. Streaming end-to-end speech recognition with jointly trained neural feature enhancement. In Proc. the 2021 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Jun. 2021, pp.6773–6777. DOI: 10.1109/ICASSP39728.2021.9414117.
[23]	Cámbara G, López F, Bonet D et al. TASE: Task-aware speech enhancement for wake-up word detection in voice assistants. Applied Sciences, 2022, 12(4): Article No. 1974. DOI: 10.3390/app12041974.
[24]	Gu Y, Du Z H, Zhang H, Zhang X. A monaural speech enhancement method for robust small-footprint keyword spotting. arXiv: 1906.08415, 2019. https://arxiv.org/abs/1906.08415, Jul. 2024.
[25]	Zhou H, Du J, Tu Y H, Lee C H. Using speech enhancement preprocessing for speech emotion recognition in realistic noisy conditions. In Proc. the 21st Annual Conference of the International Speech Communication Association, Oct. 2020, pp.4098–4102.
[26]	Fu S W, Yu C, Hsieh T A, Plantinga P, Ravanelli M, Lu X, Tsao Y. MetricGAN+: An improved version of metricGAN for speech enhancement. In Proc. the 22nd Annual Conference of the International Speech Communication Association, Aug. 30 -Sept. 3 2021, pp.201–205.
[27]	Schröter H, Rosenkranz T, Escalante-B A N, Maier A. DeepFilterNet: Perceptually motivated real-time speech enhancement. In Proc. the 24th Annual Conference of the International Speech Communication Association, Aug. 2023, pp.2008–2009.
[28]	Valentini-Botinhao C, Wang X, Takaki S, Yamagishi J. Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech. In Proc. the 9th ISCA Speech Synthesis Workshop, Sept. 2016, pp.146–152.
[29]	Dubey H, Gopal V, Cutler R, Aazami A, Matusevych S, Braun S, Eskimez S E, Thakker M, Yoshioka T, Gamper H, Aichner R. ICASSP 2022 deep noise suppression challenge. In Proc. the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, pp.9271–9275. DOI: 10.1109/ICASSP43922.2022.9747230.
[30]	Le L, Patterson A, White M. Supervised autoencoders: Improving generalization performance with unsupervised regularizers. In Proc. the 32nd Conference on Neural Information Processing Systems, Dec. 2018, pp.107–117.
[31]	Ben-David S, Blitzer J, Crammer K, Pereira F. Analysis of representations for domain adaptation. In Proc. the 20th Annual Conference on Neural Information Processing Systems, Dec. 2006, pp.137–144.
[32]	Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In Proc. the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Oct. 2015, pp.234–241. DOI: 10.1007/978-3-319-24574-4_28.
[33]	Choi H S, Kim J H, Huh J, Kim A, Ha J W, Lee K. Phase-aware speech enhancement with deep complex u-net. In Proc. the 7th International Conference on Learning Representations, May 2018.
[34]	Stoller D, Ewert S, Dixon S. Wave-U-Net: A multi-scale neural network for end-to-end audio source separation. In Proc. the 19th International Society for Music Information Retrieval Conference, Sept. 2018, pp.334–340.
[35]	Warden P. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv: 1804.03209, 2018. https://arxiv.org/abs/1804.03209, Jul. 2024.
[36]	Dai W, Dai C, Qu S H, Li J C, Das S. Very deep convolutional neural networks for raw waveforms. In Proc. the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, Mar. 2017, pp.421–425. DOI: 10.1109/ICASSP.2017.7952190.
[37]	Wang D, Wang X, Lv S. An overview of end-to-end automatic speech recognition. Symmetry, 2019, 11(8): 1018. DOI: 10.3390/sym11081018.
[38]	Hsu W N, Bolte B, Tsai Y H H, Lakhotia K, Salakhutdinov R, Mohamed A. HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2021, 29: 3451–3460. DOI: 10.1109/TASLP.2021.3122291.
[39]	Babu A, Wang C H, Tjandra A et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv: 2111.09296, 2021. https://arxiv.org/abs/2111.09296, Jul. 2024.
[40]	Jing L, Tian Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence, 2021, 43(11): 4037–4058. DOI: 10.1109/TPAMI.2020.2992393.
[41]	Liu X, Zhang F, Hou Z Y, Mian L, Wang Z, Zhang J, Tang J. Self-supervised learning: Generative or contrastive. IEEE Trans. Knowledge and Data Engineering, 2023, 35(1): 857–876. DOI: 10.1109/TKDE.2021.3090866.
[42]	Amodei A, Ananthanarayanan S, Anubhai R et al. Deep speech 2: End-to-end speech recognition in English and mandarin. In Proc. the 33rd International Conference on Machine Learning, Jun. 2016, pp.173–182.
[43]	Li H, Xu Z, Taylor G, Studer C, Goldstein T. Visualizing the loss landscape of neural nets. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.6391–6401.
[44]	Zheng N H, Shi Y P, Rong W C, Kang Y Y. Effects of skip connections in CNN-based architectures for speech enhancement. Journal of Signal Processing Systems, 2020, 92(8): 875–884. DOI: 10.1007/s11265-020-01518-1.
[45]	Hannun A, Case C, Casper J et al. Deep speech: Scaling up end-to-end speech recognition. arXiv: 1412.5567, 2014. https://arxiv.org/abs/1412.5567, Jul. 2024.
[46]	Yin S, Liu C, Zhang Z, Lin Y, Wang D, Tejedor J, Zheng F, Li Y. Noisy training for deep neural networks in speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 2015(1): 2. DOI: 10.1186/s13636-014-0047-0.
[47]	Kim J, El-Khamy M, Lee J. Bridgenets: Student-teacher transfer learning based on recursive neural networks and its application to distant speech recognition. In Proc. the 2018 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Apr. 2018, pp.5719–5723. DOI: 10.1109/ICASSP.2018.8462137.
[48]	Meng Z, Li J, Gaur Y, Gong Y. Domain adaptation via teacher-student learning for end-to-end speech recognition. In Proc. the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2019, pp.268–275. DOI: 10.1109/ASRU46091.2019.9003776.
[49]	Schuller B W. Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends. Communications of the ACM, 2018, 61(5): 90–99. DOI: 10.1145/3129340.
[50]	Busso C, Bulut M, Lee C C et al. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, 42(4): 335–359. DOI: 10.1007/s10579-008-9076-6.
[51]	Baird A, Amiriparian S, Milling M, Schuller B W. Emotion recognition in public speaking scenarios utilising an LSTM-RNN approach with attention. In Proc. the 2021 IEEE Spoken Language Technology Workshop (SLT), Jan. 2021, pp.397–402. DOI: 10.1109/SLT48900.2021.9383542.
[52]	Milling M, Baird A, Bartl-Pokorny K D, Liu S, Alcorn A M, Shen J, Tavassoli T, Ainger E, Pellicano E, Pantic M, Cummins N, Schuller B W. Evaluating the impact of voice activity detection on speech emotion recognition for autistic children. Frontiers in Computer Science, 2022, 4: 837269. DOI: 10.3389/fcomp.2022.837269.
[53]	Oates C, Triantafyllopoulos A, Steiner I, Schuller B W. Robust speech emotion recognition under different encoding conditions. In Proc. the 20th Annual Conference of the International Speech Communication Association, Sept. 2019, pp.3935–3939.
[54]	Mohamed M M, Schuller B W. ConcealNet: An end-to-end neural network for packet loss concealment in deep speech emotion recognition. arXiv: 2005.07777, 2020. https://arxiv.org/abs/2005.07777, Jul. 2024.
[55]	Triantafyllopoulos A, Reichel U, Liu S, Huber S, Eyben F, Schuller B W. Multistage linguistic conditioning of convolutional layers for speech emotion recognition. Frontiers in Computer Science, 2023, 5: 1072479. DOI: 10.3389/fcomp.2023.1072479.
[56]	Bajovic D, Bakhtiarnia A, Bravos G et al. MARVEL: Multimodal extreme scale data analytics for smart cities environments. In Proc. the 2021 In. Balkan Conf. Communications and Networking (BalkanCom), Sept. 2021, pp.143–147. DOI: 10.1109/BalkanCom53780.2021.9593258.
[57]	McDonnell M D, Gao W. Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. In Proc. the 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2020, pp.141–145. DOI: 10.1109/ICASSP40776.2020.9053274.
[58]	Heittola T, Mesaros A, Virtanen T. Acoustic scene classification in DCASE 2020 challenge: Generalization across devices and low complexity solutions. In Proc. the 5th Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE2020), Nov. 2020, pp.56–60.
[59]	Graves A, Fernández S, Gomez F J, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proc. the 23rd International Conference on Machine Learning, Jun. 2006, pp.369–376.
[60]	Panayotov V, Chen G G, Povey D, Khudanpur S. Librispeech: An ASR corpus based on public domain audio books. In Proc. the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Apr. 2015, pp.5206–5210. DOI: 10.1109/ICASSP.2015.7178964.
[61]	Liu S, Sarı L, Wu C Y, Keren G, Shangguan Y, Mahadeokar J, Kalinli O. Towards selection of text-to-speech data to augment ASR training. arXiv: 2306.00998, 2023. https://arxiv.org/abs/2306.00998, Jul. 2024.
[62]	Parada-Cabaleiro E, Costantini G, Batliner A, Schmitt M, Schuller B W. DEMoS: An Italian emotional speech corpus: Elicitation methods, machine learning, and perception. Language Resources and Evaluation, 2020, 54(2): 341–383. DOI: 10.1007/s10579-019-09450-y.
[63]	Ren Z, Baird A, Han J, Zhang Z, Schuller B. Generating and protecting against adversarial attacks for deep speech-based emotion recognition models. In Proc. the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, pp.7184–7188. DOI: 10.1109/ICASSP40776.2020.9054087.
[64]	Wang S S, Mesaros A, Heittola T, Virtanen T. A curated dataset of urban scenes for audio-visual scene analysis. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.626–630. DOI: 10.1109/ICASSP39728.2021.9415085.

Relative Articles

[1]	Quan Feng, Jia-Yu Yao, Ming-Kun Xie, Sheng-Jun Huang, Song-Can Chen. Sequential Cooperative Distillation for Imbalanced Multi-Task Learning[J]. Journal of Computer Science and Technology, 2024, 39(5): 1094-1106. DOI: 10.1007/s11390-024-2264-z
[2]	Yi-Qiang Chen, Teng Zhang, Xin-Long Jiang, Qian Chen, Chen-Long Gao, Wu-Liang Huang. ${ \mathtt{FedBone}}$ : Towards Large-Scale Federated Multi-Task Learning[J]. Journal of Computer Science and Technology, 2024, 39(5): 1040-1057. DOI: 10.1007/s11390-024-3639-x
[3]	Xue-Yang Qin, Li-Shuang Li, Jing-Yao Tang, Fei Hao, Mei-Ling Ge, Guang-Yao Pang. Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval[J]. Journal of Computer Science and Technology, 2024, 39(4): 811-826. DOI: 10.1007/s11390-024-4125-1
[4]	Xiu-Yi Jia, Sai-Sai Zhu, Wei-Wei Li. Joint Label-Specific Features and Correlation Information for Multi-Label Learning[J]. Journal of Computer Science and Technology, 2020, 35(2): 247-258. DOI: 10.1007/s11390-020-9900-z
[5]	De-Fu Lian, Qi Liu. Jointly Recommending Library Books and Predicting Academic Performance: A Mutual Reinforcement Perspective[J]. Journal of Computer Science and Technology, 2018, 33(4): 654-667. DOI: 10.1007/s11390-018-1847-y
[6]	Xi-Jin Zhang, Yi-Fan Lu, Song-Hai Zhang. Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks[J]. Journal of Computer Science and Technology, 2016, 31(3): 489-500. DOI: 10.1007/s11390-016-1642-6
[7]	Wei Wu, Hang Li, Yun-Hua Hu, Rong Jin. A Kernel Approach to Multi-Task Learning with Task-Specific Kernels[J]. Journal of Computer Science and Technology, 2012, 27(6): 1289-1301. DOI: 10.1007/s11390-012-1305-1
[8]	Lizhong Dai, Dongmei Zhao. Uplink Scheduling for Supporting Real Time Voice Traffic in IEEE 802.16 Backhaul Networks[J]. Journal of Computer Science and Technology, 2008, 23(5): 806-814.
[9]	Wang Jian. Integration Model of Eye-Gaze, Voice and Manual Response in Multimodal User Interface[J]. Journal of Computer Science and Technology, 1996, 11(5): 512-518.
[10]	Ni Yongren. Interface for Voice Input and Output to the Transputer System[J]. Journal of Computer Science and Technology, 1989, 4(2): 188-192.

Supplements (2)

Supplements
Others
- External link to attachment
  https://rdcu.be/dUUAP
- PDF format
  2024-4-10-2934-Highlights 370KB

Cited By

Get Citation

PDF

XML

Read Online

Article views (133) PDF downloads (13)

Indexed in:

Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

Abstract

Conflict of Interest

References

Related Articles

Supplements

Others

External link to attachment

PDF format

Catalog

Related

Home

Overview

Resources

Contents

Indexed in:

Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

Abstract

Conflict of Interest

References

Related Articles

Supplements

Others

External link to attachment

PDF format

Catalog

Related

Home

Overview

Resources

Contents

Export File

Citation

Format

Content