We use cookies to improve your experience with our site.

Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation

Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, Feng Xu

downloadPDF
刘凡, 陈德龙, 周睿志, 杨赛, 许峰. 基于自监督音频动作同步性学习的音乐驱动的指挥动作生成[J]. 计算机科学技术学报, 2022, 37(3): 539-558. DOI: 10.1007/s11390-022-2030-z
引用本文: 刘凡, 陈德龙, 周睿志, 杨赛, 许峰. 基于自监督音频动作同步性学习的音乐驱动的指挥动作生成[J]. 计算机科学技术学报, 2022, 37(3): 539-558. DOI: 10.1007/s11390-022-2030-z
Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, Feng Xu. Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation[J]. Journal of Computer Science and Technology, 2022, 37(3): 539-558. DOI: 10.1007/s11390-022-2030-z
Citation: Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, Feng Xu. Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation[J]. Journal of Computer Science and Technology, 2022, 37(3): 539-558. DOI: 10.1007/s11390-022-2030-z
刘凡, 陈德龙, 周睿志, 杨赛, 许峰. 基于自监督音频动作同步性学习的音乐驱动的指挥动作生成[J]. 计算机科学技术学报, 2022, 37(3): 539-558. CSTR: 32374.14.s11390-022-2030-z
引用本文: 刘凡, 陈德龙, 周睿志, 杨赛, 许峰. 基于自监督音频动作同步性学习的音乐驱动的指挥动作生成[J]. 计算机科学技术学报, 2022, 37(3): 539-558. CSTR: 32374.14.s11390-022-2030-z
Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, Feng Xu. Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation[J]. Journal of Computer Science and Technology, 2022, 37(3): 539-558. CSTR: 32374.14.s11390-022-2030-z
Citation: Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, Feng Xu. Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation[J]. Journal of Computer Science and Technology, 2022, 37(3): 539-558. CSTR: 32374.14.s11390-022-2030-z

基于自监督音频动作同步性学习的音乐驱动的指挥动作生成

详细信息
    作者简介:

    陈德龙: De-Long Chen received his B.S. degree in computer science in Hohai University, Nanjing, in 2021. He is currently a research assistant in Hohai University, Nanjing, and a research intern at MEGVII Technology, Beijing. His research includes computer vision, music information retrieval, multimodal learning, unsupervised learning and self-supervised learning.

Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation

Funds: This work was partially funded by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20191298 and the National Natural Science Foundation of China under Grant No. 61902110.
More Information
    Author Bio:

    De-Long Chen received his B.S. degree in computer science in Hohai University, Nanjing, in 2021. He is currently a research assistant in Hohai University, Nanjing, and a research intern at MEGVII Technology, Beijing. His research includes computer vision, music information retrieval, multimodal learning, unsupervised learning and self-supervised learning.

    Corresponding author:

    De-Long Chen E-mail: chendelong@hhu.edu.cn

  • 摘要: 1、研究背景(Context):音乐与人体动作之间的内在关联性一直以来都在被广泛研究。最近,许多学者成功地使用深度学习模型进行了舞蹈动作或乐器演奏动作的生成,但很少有人关注乐队指挥的动作。指挥动作需要同时表达节拍、演奏法、音乐情绪等多个方面的信息,而现有的方法大都基于人工设定的规则,生成的指挥动作不自然,且只能表达很简单的语义。
    2、目的(Objective):本文试图借助深度神经网络的强大学习能力完成音乐驱动的指挥动作生成,即以音乐为条件控制信号,生成与之节奏同步、语义相关,且自然优美的指挥动作。
    3、方法(Method):本文提出了一个包括两阶段的学习框架。在第一阶段中,基于自监督对比学习训练一个包含音乐编码器与动作编码器的音频-动作同步网络(Music Motion Synchronization Network,M2S-Net)。在第二阶段中,使用训练好的动作编码器比对生成动作与真实动作的语义相似度计算同步损失,使用判别器判别生成动作的真实度计算对抗损失,联合训练一个音频-动作同步的生成对抗网络(Music Motion Synchronized Generative Adversarial Network,M2S-GAN)。此外本文还基于目标检测与姿态估计算法从在线视频平台收集并构建了一个大规模指挥动作数据集ConductorMotion100,以提供可靠的数据支撑。
    4、结果(Result & Findings):在ConductorMotion100数据集上,无论是基于多种评价指标的定量对比,还是基于动作可视化的定性分析,本文提出方法的准确性、多样性与真实性都超过了多个对比的方法。同时,实验发现,在音视频自监督任务上性能良好的负样本采样策略会导致较多的错误负样本,并不适用于音频-指挥动作的数据。
    5、结论(Conclusions):本文实现了首个基于深度学习的指挥动作生成算法,并首次将多模态自监督对比学习应用至音乐-动作数据上。本文提出的方法能够生成准确、多样且美观的指挥动作。本文的两阶段学习过程有希望被推广为一个通用的跨模态条件生成框架, 同时,本文构建的ConductorMotion100数据集可以被拓展用作音乐信息检索任务(如节拍检测)的大规模预训练数据集。
    Abstract: The correlation between music and human motion has attracted widespread research attention. Although recent studies have successfully generated motion for singers, dancers, and musicians, few have explored motion generation for orchestral conductors. The generation of music-driven conducting motion should consider not only the basic music beats, but also mid-level music structures, high-level music semantic expressions, and hints for different parts of orchestras (strings, woodwind, etc.). However, most existing conducting motion generation methods rely heavily on human-designed rules, which significantly limits the quality of generated motion.Therefore, we propose a novel Music Motion Synchronized Generative Adversarial Network (M2S-GAN), which generates motions according to the automatically learned music representations. More specifically, M2S-GAN is a cross-modal generative network comprising four components: 1) a music encoder that encodes the music signal; 2) a generator that generates conducting motion from the music codes; 3) a motion encoder that encodes the motion; 4) a discriminator that differentiates the real and generated motions. These four components respectively imitate four key aspects of human conductors: understanding music, interpreting music, precision and elegance. The music and motion encoders are first jointly trained by a self-supervised contrastive loss, and can thus help to facilitate the music motion synchronization during the following adversarial learning process. To verify the effectiveness of our method, we construct a large-scale dataset, named ConductorMotion100, which consists of unprecedented 100 hours of conducting motion data. Extensive experiments on ConductorMotion100 demonstrate the effectiveness of M2S-GAN. Our proposed approach outperforms various comparison methods both quantitatively and qualitatively. Through visualization, we show that our approach can generate plausible, diverse, and music-synchronized conducting motion.
  • [1]

    Ren X, Li H, Huang Z, Chen Q. Self-supervised dance video synthesis conditioned on music. In Proc. the 28th ACM International Conference on Multimedia, October 2020, pp.46-54. DOI: 10.1145/3394171.3413932.

    [2]

    Lee H, Yang X, Liu M, Wang T, Lu Y, Yang M, Kautz J. Dancing to music. In Proc. the Annual Conference on Neural Information Processing Systems, December 2019, pp.3581-3591.

    [3]

    Li B, Maezawa A, Duan Z. Skeleton plays piano: Online generation of pianist body movements from MIDI performance. In Proc. the 19th International Society for Music Information Retrieval Conference, September 2018, pp.218-224.

    [4]

    Kao H, Su L. Temporally guided music-to-body-movement generation. In Proc. the 28th ACM International Conference on Multimedia, October 2020, pp.147-155. DOI: 10.1145/3394171.3413848.

    [5]

    Ruttkay Z, Huang Z, Eliens A. The conductor: Gestures for embodied agents with logic programming. In Proc. the Joint Annual ERCIM/CoLogNet International Workshop on Constraint and Logic Programming, June 30-July 2, 2003, pp.9-16. DOI: 10.1007/978-3-540-24662-6.

    [6]

    Bos P, Reidsma D, Ruttkay Z, Nijholt A. Interacting with a virtual conductor. In Proc. the 5th International Conference on Entertainment Computing, September 2006, pp.25-30. DOI: 10.1007/11872320.

    [7]

    Nijholt A, Reidsma D, Ebbers R, Maat M. The virtual conductor: Learning and teaching about music, performing, and conducting. In Proc. the 8th IEEE International Conference on Advanced Learning Technologies, July 2008, pp.897-899. DOI: 10.1109/ICALT.2008.43.

    [8]

    Maat M, Ebbers R, Reidsma D, Nijholt A. Beyond the beat: Modelling intentions in a virtual conductor. In Proc. the 2nd International Conference on Intelligent Technologies for Interactive Entertainment, January 2008, Article No. 12. DOI: 10.4108/ICST.INTETAIN2008.2489.

    [9]

    Reidsma D, Nijholt A, Bos P. Temporal interaction between an artificial orchestra conductor and human musicians. Comput. Entertain., 2008, 6(4): Article No. 53. DOI: 10.1145/1461999.1462005.

    [10]

    Takatsu R, Maki Y, Inoue T, Okada K, Shigeno H. Multiple virtual conductors allow amateur orchestra players to perform better and more easily. In Proc. the 20th IEEE International Conference on Computer Supported Cooperative Work in Design, May 2016, pp.486-491. DOI: 10.1109/CSCWD.2016.7566038.

    [11]

    Katayama N, Takatsu R, Inoue T, Shigeno H, Okada K. Efficient generation of conductor avatars for the concert by multiple virtual conductors. In Proc. the 8th International Conference on Collaboration Technologies and Social Computing, Sept. 2016, pp.45-57. DOI: 10.1007/978-981-10-2618-8.

    [12]

    Wang T, Zheng N, Li Y, Xu Y, Shum H. Learning kernel-based HMMs for dynamic sequence synthesis. Graph. Model., 2003, 65(4): 206-221. DOI: 10.1016/S1524-0703(03)00040-7.

    [13]

    Shu X, Qi G, Tang J, Wang J. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proc. the 23rd Annual ACM Conference on Multimedia, October 2015, pp.35-44. DOI: 10.1145/2733373.2806216.

    [14]

    Tang J, Shu X, Qi G, Li Z, Wang M, Yan S, Jain R C. Tri-clustered tensor completion for social-aware image tag refinement. IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39(8): 1662-1674. DOI: 10.1109/TPAMI.2016.2608882.

    [15]

    Tang J, Shu X, Li Z, Jiang Y, Tian Q. Social anchor-unit graph regularized tensor completion for large-scale image retagging. IEEE Trans. Pattern Anal. Mach. Intell., 2019, 41(8): 2027-2034. DOI: 10.1109/TPAMI.2019.2906603.

    [16]

    Du X, Yang Y, Yang L, Shen F, Qin Z, Tang J. Captioning videos using large-scale image corpus. J. Comput. Sci. Technol., 2017, 32(3): 480-493. DOI: 10.1007/s11390-017-1738-7.

    [17]

    Korbar B, Tran D, Torresani L. Cooperative learning of audio and video models from self-supervised synchronization. In Proc. the Annual Conference on Neural Information Processing Systems, December 2018, pp.7774-7785.

    [18]

    Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv:1701.07875, 2017. https://arxiv.org/pdf/1701.078 75.pdf, Dec. 2021.

    [19]

    Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C. Improved training of Wasserstein GANs. In Proc. the Annual Conference on Neural Information Processing Systems, December 2017, pp.5767-5777.

    [20]

    Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv:1804.02767, 2018. https://arxiv.org/abs/ 1804.02767, Dec. 2021.

    [21]

    Fang H, Xie S, Tai Y, Lu C. RMPE: Regional multi-person pose estimation. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2353-2362. DOI: 10.1109/ICCV.2017.256.

    [22]

    Geuther B, Breese A, Wang Y. A study on musical conducting robots and their users. In Proc. the 10th IEEE-RAS International Conference on Humanoid Robots, December 2010, pp.124-129. DOI: 10.1109/ICHR.2010.5686302.

    [23]

    Salgian A, Ault C, Nakra T M, Wang Y, Stone M. Multidisciplinary computer science through conducting robots. In Proc. the 42nd ACM Technical Symposium on Computer Science Education, March 2011, pp.219-224. DOI: 10.1145/1953163.1953229.

    [24]

    Salgian A, Ault C, Nakra T M, Wang Y, Stone M. A theory of ‘multiple creativities’: Outcomes from an undergraduate seminar in conducting robots. In Proc. the Music, Mind, and Invention Workshop, March 2012.

    [25]

    Dansereau D G, Brock N, Cooperstock J R. Predicting an orchestral conductor’s baton movements using machine learning. Comput. Music. J., 2013, 37(2): 28-45. DOI: 10.1162/COMJa.

    [26]

    Yalta N. Sequential deep learning for dancing motion generation. In Proc. the 46th AI Challenge Study Group, November 2016, pp.43-49.

    [27]

    Li Z, Liu F, Yang W, Peng S, Zhou J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2021.3084827.

    [28]

    Yalta N, Watanabe S, Nakadai K, Ogata T. Weakly-supervised deep recurrent neural networks for basic dance step generation. In Proc. the 2019 International Joint Conference on Neural Networks, July 2019. DOI: 10.1109/IJCNN.2019.8851872.

    [29]

    Tang T, Jia J, Mao H. Dance with melody: An LSTM-autoencoder approach to music-oriented dance synthesis. In Proc. the 2018 ACM Multimedia Conference on Multimedia, October 2018, pp.1598-1606. DOI: 10.1145/3240508.3240526.

    [30]

    Bogaers A, Yumak Z, Volk A. Music-driven animation generation of expressive musical gestures. In Proc. the 2020 International Conference on Multimodal Interaction, October 2020, pp.22-26. DOI: 10.1145/3395035.3425244.

    [31]

    Qi Y, Liu Y, Sun Q. Music-driven dance generation. IEEE Access, 2019, 7: 166540-166550. DOI: 10.1109/ACCESS.2019.2953698.

    [32]

    Shlizerman E, Dery L M, Schoen H, Kemelmacher-Shlizerman I. Audio to body dynamics. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.7574-7583. DOI: 10.1109/CVPR.2018.00790.

    [33]

    Haag K, Shimodaira H. Bidirectional LSTM networks employing stacked bottleneck features for expressive speech-driven head motion synthesis. In Proc. the 16th International Conference on Intelligent Virtual Agents, September 2016, pp.198-207. DOI: 10.1007/978-3-319-47665-0.

    [34]

    Ferstl Y, McDonnell R. Investigating the use of recurrent motion modelling for speech gesture generation. In Proc. the 18th International Conference on Intelligent Virtual Agents, November 2018, pp.93-98. DOI: 10.1145/3267851.3267898.

    [35]

    Sadoughi N, Busso C. Joint learning of speech-driven facial motion with bidirectional long-short term memory. In Proc. the 17th International Conference on Intelligent Virtual Agents, August 2017, pp.389-402. DOI: 10.1007/978-3-319-67401-8.

    [36]

    Huang R, Hu H, Wu W, Sawada K, Zhang M, Jiang D. Dance revolution: Long-term dance generation with music via curriculum learning. In Proc. the 9th International Conference on Learning Representations, May 2021.

    [37]

    Sun G, Wong Y, Cheng Z, Kankanhalli M S, Geng W, Li X. DeepDance: Music-to-dance motion choreography with adversarial learning. IEEE Trans. Multim., 2020, 23: 497-509. DOI: 10.1109/TMM.2020.2981989.

    [38]

    Ahn H, Kim J, Kim K, Oh S. Generative autoregressive networks for 3D dancing move synthesis from music. IEEE Robotics and Automation Letters, 2020, 5(2): 3501-3508. DOI: 10.1109/LRA.2020.2977333.

    [39]

    Lee J, Kim S, Lee K. Automatic choreography generation with convolutional encoder-decoder network. In Proc. the 20th International Society for Music Information Retrieval Conference, November 2019, pp.894-899. DOI: 10.5281/zenodo.3527958.

    [40]

    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.

    [41]

    Li R, Yang S, Ross D A, Kanazawa A. Learn to dance with AIST + +: Music conditioned 3D dance generation. arXiv:2101.08779, 2021. https://arxiv.org/abs/2101.08779, Dec. 2021.

    [42]

    Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A C, Bengio Y. Generative adversarial nets. In Proc. the Annual Conference on Neural Information Processing Systems, December 2014, pp.2672-2680.

    [43]

    Ginosar S, Bar A, Kohavi G, Chan C, Owens A, Malik J. Learning individual styles of conversational gesture. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.3497-3506. DOI: 10.1109/CVPR.2019.00361.

    [44]

    Eskimez S E, Maddox R K, Xu C, Duan Z. End-to-end generation of talking faces from noisy speech. In Proc. the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, pp.1948-1952. DOI: 10.1109/ICASSP40776.2020.9054103.

    [45]

    Song Y, Zhu J, Li D, Wang A, Qi H. Talking face generation by conditional recurrent adversarial network. In Proc. the 28th International Joint Conference on Artificial Intelligence, August 2019, pp.919-925.

    [46]

    Sadoughi N, Busso C. Novel realizations of speech-driven head movements with generative adversarial networks. In Proc. the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, April 2018, pp.6169-6173. DOI: 10.1109/ICASSP.2018.8461967.

    [47]

    Ferstl Y, Neff M, McDonnell R. Multi-objective adversarial gesture generation. In Proc. the Motion, Interaction and Games, October 2019, Article No. 3. DOI: 10.1145/3359566.3360053.

    [48]

    Sarasúa Á. Context-aware gesture recognition in classical music conducting. In Proc. the 21st ACM International Conference on Multimedia, October 2013, pp.1059-1062. DOI: 10.1145/2502081.2502216.

    [49]

    Sarasúa Á, Guaus E. Beat tracking from conducting gestural data: A multi-subject study. In Proc. the International Workshop on Movement and Computing, June 2014, pp.118-123. DOI: 10.1145/2617995.2618016.

    [50]

    Karipidou K, Ahnlund J, Friberg A, Alexanderson S, Kjellström H. Computer analysis of sentiment interpretation in musical conducting. In Proc. the 12th IEEE International Conference on Automatic Face & Gesture Recognition, May 30-June 3, 2017, pp.400-405. DOI: 10.1109/FG.2017.57.

    [51]

    Huang Y, Chen T, Moran N, Coleman S, Su L. Identifying expressive semantics in orchestral conducting kinematics. In Proc. the 20th International Society for Music Information Retrieval Conference, November 2019, pp.115-122. DOI: 10.5281/zenodo.3527753.

    [52]

    Lemouton S, Borghesi R, Haapamäki S, Bevilacqua F, Fléty E. Following orchestra conductors: The IDEA open movement dataset. In Proc. the 6th International Conference on Movement and Computing, October 2019, Article No. 25. DOI: 10.1145/3347122.3359599.

    [53]

    Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.7444-7452.

    [54]

    Bai S, Kolter J Z, Koltun V. Convolutional sequence modeling revisited. In Proc. the 6th International Conference on Learning Representations, April 30-May 3, 2018.

    [55]

    Arandjelovic R, Zisserman A. Look, listen and learn. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.609-617. DOI: 10.1109/ICCV.2017.73.

    [56]

    Chung J S, Zisserman A. Out of time: Automated lip sync in the wild. In Proc. the 2016 ACCV International Workshops on Computer Vision, November 2016, pp.251-263. DOI: 10.1007/978-3-319-54427-4.

    [57]

    Chen L, Srivastava S, Duan Z, Xu C. Deep cross-modal audio-visual generation. In Proc. the Thematic Workshops of the 2017 ACM Multimedia, October 2017, pp.349-357. DOI: 10.1145/3126686.3126723.

    [58]

    Hao W, Zhang Z, Guan H. CMCGAN: A uniform framework for cross-modal visual-audio mutual generation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.6886-6893.

    [59]

    Zhou H, Liu Z, Xu X, Luo P, Wang X. Vision-infused deep audio inpainting. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, pp.283-292. DOI: 10.1109/ICCV.2019.00037.

    [60]

    Choi H, Park C, Lee K. From inference to generation: End-to-end fully self-supervised generation of human face from speech. In Proc. the 8th International Conference on Learning Representations, April 2020.

    [61]

    Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.694-711. DOI: 10.1007/978-3-319-46475-6.

    [62]

    Li M, Hsu W, Xie X, Cong J, Gao W. SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network. IEEE Trans. Medical Imaging, 2020, 39(7): 2289-2301. DOI: 10.1109/TMI.2020.2968472.

    [63]

    Akella R T, Halder S S, Shandeelya A P, Pankajakshan V. Enhancing perceptual loss with adversarial feature matching for super-resolution. In Proc. the 2020 International Joint Conference on Neural Networks, July 2020. DOI: 10.1109/IJCNN48605.2020.9207102.

    [64]

    Tieleman T, Hinton G. Lecture 6.5-rmsprop, COURSERA: Neural networks for machine learning. Technical Report, University of Toronto, 2012.

    [65]

    Diederik P K, Jimmy B. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

    [66]

    Sarasúa Á, Caramiaux B, Tanaka A. Machine learning of personal gesture variation in music conducting. In Proc. the 2016 CHI Conference on Human Factors in Computing Systems, May 2016, pp.3428-3432. DOI: 10.1145/2858036.2858328.

    [67]

    Cosentino S, Petersen K, Lin Z, Bartolomeo L, Sessa S, Zecca M, Takanishi A. Natural human-robot musical interaction: Understanding the music conductor gestures by using the WB-4 inertial measurement system. Adv. Robotics, 2014, 28(11): 781-792. DOI: 10.1080/01691864.2014.889577.

    [68]

    Lee K, Junokas M J, Amanzadeh M, Garnett G E. An analysis of basic expressive qualities in instrumental conducting. In Proc. the 2nd International Workshop on Movement and Computing, August 2015, pp.148-155. DOI: 10.1145/2790994.2791005.

  • 期刊类型引用(7)

    1. Yumei Zhang, Changlong Liu. Melody prediction of vocal performance using LSTM and attention mechanism and its application in folk music innovation. Journal of Computational Methods in Sciences and Engineering, 2025. 必应学术
    2. Hui Jiang, Yu Chen, Di Wu, et al. EEG-driven automatic generation of emotive music based on transformer. Frontiers in Neurorobotics, 2024, 18 必应学术
    3. Yuzhen Zhang, Junning Su, Hang Guo, et al. S-CVAE: Stacked CVAE for Trajectory Prediction With Incremental Greedy Region. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(12): 20351. 必应学术
    4. Jisoo Oh, Jinwoo Jeong, Youngho Chai. A Transfer Learning Approach for Music-driven 3D Conducting Motion Generation with Limited Data. 30th ACM Symposium on Virtual Reality Software and Technology, 必应学术
    5. Ruizhi Zhou, Yanling Pan. FloodDAN: Unsupervised Flood Forecasting based on Adversarial Domain Adaptation. 2022 IEEE 5th International Conference on Big Data and Artificial Intelligence (BDAI), 必应学术
    6. Pallavi Ganorkar, Anagha Rathkanthiwar. Design of an Improved Model for Music Sequence Generation Using Conditional Variational Autoencoder and Conditional GAN. 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI), 必应学术
    7. Hiroki Nishizawa, Keitaro Tanaka, Asuka Hirata, et al. SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 必应学术

    其他类型引用(0)

  • 其他相关附件

计量
  • 文章访问数:  149
  • HTML全文浏览量:  8
  • PDF下载量:  1
  • 被引次数: 7
出版历程
  • 收稿日期:  2021-11-18
  • 修回日期:  2022-02-27
  • 录用日期:  2022-03-09
  • 发布日期:  2022-05-29

目录

    /

    返回文章
    返回