计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (2): 434-444.doi: 10.1007/s11390-021-9599-5

所属专题: Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• • 上一篇    下一篇

基于卷积神经网络的实时多阶段斑马鱼头部姿态估计框架

Zhang-Jin Huang1,2,3, Member, CCF, ACM, IEEE, Xiang-Xiang He1, Fang-Jun Wang1,2, and Qing Shen1   

  1. 1 School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;
    2 School of Data Science, University of Science and Technology of China, Hefei 230027, China;
    3 Anhui Province Key Laboratory of Software in Computing and Communication, Hefei 230027, China
  • 收稿日期:2019-03-30 修回日期:2019-06-03 出版日期:2021-03-05 发布日期:2021-04-01
  • 作者简介:Zhang-Jin Huang received his B.S. and Ph.D. degrees in computational mathematics from University of Science and Technology of China (USTC), Hefei, in 1999 and 2005, respectively. He is currently an associate professor with the School of Computer Science and Technology, and the School of Data Science, USTC, Hefei. His current research interests include computer graphics, computer vision, machine learning and deep learning.
  • 基金资助:
    This work was supported in part by the National Key Research and Development Program of China under Grant No. 2018YFC1504104, the Fundamental Research Funds for the Central Universities of China under Grant No. WK6030000109, and the National Natural Science Foundation of China under Grant No. 61877056.

A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks

Zhang-Jin Huang1,2,3, Member, CCF, ACM, IEEE, Xiang-Xiang He1, Fang-Jun Wang1,2, and Qing Shen1        

  1. 1 School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;
    2 School of Data Science, University of Science and Technology of China, Hefei 230027, China;
    3 Anhui Province Key Laboratory of Software in Computing and Communication, Hefei 230027, China
  • Received:2019-03-30 Revised:2019-06-03 Online:2021-03-05 Published:2021-04-01
  • About author:Zhang-Jin Huang received his B.S. and Ph.D. degrees in computational mathematics from University of Science and Technology of China (USTC), Hefei, in 1999 and 2005, respectively. He is currently an associate professor with the School of Computer Science and Technology, and the School of Data Science, USTC, Hefei. His current research interests include computer graphics, computer vision, machine learning and deep learning.
  • Supported by:
    This work was supported in part by the National Key Research and Development Program of China under Grant No. 2018YFC1504104, the Fundamental Research Funds for the Central Universities of China under Grant No. WK6030000109, and the National Natural Science Foundation of China under Grant No. 61877056.

为了对自由游动下的斑马鱼进行光遗传学实验,必须对斑马鱼头部进行量化,以确定准确的打光位置。为了在有限的资源的设备CPU上,有效地量化斑马鱼头部的行为,我们提出了一种基于卷积神经网络的实时多阶段框架来对斑马鱼头部姿态进行估计。每个阶段都用一个小的神经网络来实现。具体来说,第一阶段使用名为Micro-YOLO的轻型目标探测器用于检测斑马鱼头部的大致区域。在第二阶段,我们设计了一个微小的包围盒优化网络,产生一个更高质量的斑马鱼头部区域。最后,设计了一个小的姿态估计网络tiny-hourglass来检测斑马鱼头部的关键点。实验结果表明,利用Micro-yolo结合RegressNet对斑马鱼头部区域进行预测,不仅比二阶段检测器Faster R-CNN更准确,而且速度更快。我们的整体框架在斑马鱼头部姿态估计方面比当前最好的用于对用户自定义区域做姿态估计的方法DeepLabCut更精准, CPU运行速度比其快19倍。

关键词: 卷积神经网络, 姿态估计, 实时, 斑马鱼

Abstract: In order to conduct optical neurophysiology experiments on a freely swimming zebrafish, it is essential to quantify the zebrafish head to determine exact lighting positions. To efficiently quantify a zebrafish head's behaviors with limited resources, we propose a real-time multi-stage architecture based on convolutional neural networks for pose estimation of the zebrafish head on CPUs. Each stage is implemented with a small neural network. Specifically, a light-weight object detector named Micro-YOLO is used to detect a coarse region of the zebrafish head in the first stage. In the second stage, a tiny bounding box refinement network is devised to produce a high-quality bounding box around the zebrafish head. Finally, a small pose estimation network named tiny-hourglass is designed to detect keypoints in the zebrafish head. The experimental results show that using Micro-YOLO combined with RegressNet to predict the zebrafish head region is not only more accurate but also much faster than Faster R-CNN which is the representative of two-stage detectors. Compared with DeepLabCut, a state-of-the-art method to estimate poses for user-defined body parts, our multi-stage architecture can achieve a higher accuracy, and runs 19x faster than it on CPUs.

Key words: convolutional neural network, pose estimation, real-time, zebrafish

[1] Cong L, Wang Z, Chai Y, Han W, Shang C, Yang W, Bai L, Du J, Wang K, Wen Q. Rapid whole brain imaging of neural activity in freely behaving larval zebrafish (Danio rerio). Elife, 2017, 6:Article No. e28158. DOI:10.7554/elife.28158.
[2] Xu Z P, Cheng X E. Zebrafish tracking using convolutional neural networks. Scientific Reports, 2017, 7:Article No. 42815. DOI:10.1038/srep42815.
[3] Mathis A, Mamidanna P, Cury K M, Abe T, Murthy V N, Mathis M W, Bethge M. DeepLabCut:Markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 2018, 21:1281-1289. DOI:10.1038/s41593-018-0209-y.
[4] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.580-587. DOI:10.1109/CVPR.2014.81.
[5] Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. DOI:10.1109/ICCV.2015.169.
[6] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In Proc. the 29th Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99.
[7] Dai J, Li Y, He K, Sun J. R-FCN:Object detection via region-based fully convolutional networks. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2016, pp.379-387.
[8] Uijlings J R, van de Sande K E, Gevers T, Smeulders A W. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2):154-171. DOI:10.1007/s11263-013-0620-5.
[9] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once:Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.779-788. DOI:10.1109/CVPR.2016.91.
[10] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD:Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. DOI:10.1007/978-3-319-46448-02.
[11] Cai Z, Vasconcelos N. Cascade R-CNN:Delving into high quality object detection. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6154-6162. DOI:10.1109/CVPR.2018.00644.
[12] Toshev A, Szegedy C. DeepPose:Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660. DOI:10.1109/CVPR.2014.214.
[13] Pfister T, Simonyan K, Charles J, Zisserman A. Deep convolutional neural networks for efficient pose estimation in gesture videos. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.538-552. DOI:10.1007/978-3-319-16865-435.
[14] Carreira J, Agrawal P, Fragkiadaki K, Malik J. Human pose estimation with iterative error feedback. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4733-4742. DOI:10.1109/CVPR.2016.512.
[15] Pfister T, Charles J, Zisserman A. Flowing ConvNets for human pose estimation in videos. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1913-1921. DOI:10.1109/ICCV.2015.222.
[16] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732. DOI:10.1109/CVPR.2016.511.
[17] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.483-499. DOI:10.1007/978-3-319-46484-829.
[18] Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut:Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937. DOI:10.1109/CVPR.2016.533.
[19] Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B. Deepercut:A deeper, stronger, and faster multiperson pose estimation model. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.34-50. DOI:10.1007/978-3-319-46466-43.
[20] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multiperson 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310. DOI:10.1109/CVPR.2017.143.
[21] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI:10.1109/CVPR.2016.90.
[22] Li S, Fang Z, Song W, Hao A, Qin H. Bidirectional optimization coupled lightweight networks for efficient and robust multi-person 2D pose estimation. Journal of Computer Science and Technology, 2019, 34(3):522-536. DOI:10.1007/s11390-019-1924-x.
[1] 杨小龙 贾晓红 梁缘 樊鲁宾. RGB图像遮挡场景中的6D对象姿态估计[J]. 计算机科学技术学报, 2022, 37(3): 719-730.
[2] 魏华鹏, 邓盈盈, 唐帆, 潘兴甲, 董未名. 基于卷积神经网络和Transformer的视觉风格迁移的比较研究[J]. 计算机科学技术学报, 2022, 37(3): 601-614.
[3] 张鑫, 陆思源, 王水花, 余翔, 王甦菁, 姚仑, 潘毅, 张煜东. 通过新型深度学习架构诊断COVID-19肺炎[J]. 计算机科学技术学报, 2022, 37(2): 330-343.
[4] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. 基数估计器:利用垂直扫描卷积神经网络处理SQL[J]. 计算机科学技术学报, 2021, 36(4): 762-777.
[5] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. 婴儿的年龄和性别容易被识别吗?[J]. 计算机科学技术学报, 2021, 36(3): 508-519.
[6] 梁盾, 郭元晨, 张少魁, 穆太江, 黄晓蕾. 车道检测-新结果和调查研究[J]. 计算机科学技术学报, 2020, 35(3): 493-505.
[7] Jin-Gong Jia, Yuan-Feng Zhou, Xing-Wei Hao, Feng Li, Christian Desrosiers, Cai-Ming Zhang. 双流时间卷积神经网络用于基于骨架的人体动作识别[J]. 计算机科学技术学报, 2020, 35(3): 538-550.
[8] Rui-Song Zhang, Wei-Ze Quan, Lu-Bin Fan, Li-Ming Hu, Dong-Ming Yan. 基于通道和像素相关性的计算机生成图像与自然图像鉴别[J]. 计算机科学技术学报, 2020, 35(3): 592-602.
[9] Ze-Wei Chen, Hang Lei, Mao-Lin Yang, Yong Liao, Jia-Li Yu. 一种面向资源的划分调度下改进的任务与资源划分方法[J]. 计算机科学技术学报, 2019, 34(4): 839-853.
[10] Robail Yasrab. SRNET:用于解析奇点的基于浅跳跃连接的卷积神经网络[J]. 计算机科学技术学报, 2019, 34(4): 924-938.
[11] Jun-Hua Fang, Peng-Peng Zhao, An Liu, Zhi-Xu Li, Lei Zhao. 分布式数据流中轨迹大数据的自适应连接方法[J]. 计算机科学技术学报, 2019, 34(4): 747-761.
[12] Ri-Sheng Liu, Cai-Sheng Mao, Zhi-Hui Wang, Hao-Jie Li. 基于灵活稀疏结构控制和自适应优化算法的模糊图像盲复原[J]. 计算机科学技术学报, 2019, 34(3): 609-621.
[13] Han Liu, Hang Du, Dan Zeng, Qi Tian. 基于超像素分类和语义分割的云检测算法[J]. 计算机科学技术学报, 2019, 34(3): 622-633.
[14] Shuai Li, Zheng Fang, Wen-Feng Song, Ai-Min Hao, Hong Qin. 基于双向特征共享网络的多人姿态估计方法研究[J]. 计算机科学技术学报, 2019, 34(3): 522-536.
[15] Dong-Di Zhao, Fan Li, Kashif Sharif, Guang-Min Xia, Yu Wang. 深度卷积神经网络的空间高效量化[J]. 计算机科学技术学报, 2019, 34(2): 305-317.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[2] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[3] 潘启敬;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[4] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[7] 史忠植;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[8] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] 夏培肃; 方信我; 王玉祥; 严开明; 张廷军; 刘玉兰; 赵春英; 孙继忠;. Design of Array Processor Systems[J]. , 1987, 2(3): 163 -173 .
[10] 孙永强; 陆汝占; 黄小戎;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: