Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (2): 434-444.doi: 10.1007/s11390-021-9599-5

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Regular Paper • Previous Articles     Next Articles

A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks

Zhang-Jin Huang1,2,3, Member, CCF, ACM, IEEE, Xiang-Xiang He1, Fang-Jun Wang1,2, and Qing Shen1        

  1. 1 School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;
    2 School of Data Science, University of Science and Technology of China, Hefei 230027, China;
    3 Anhui Province Key Laboratory of Software in Computing and Communication, Hefei 230027, China
  • Received:2019-03-30 Revised:2019-06-03 Online:2021-03-05 Published:2021-04-01
  • About author:Zhang-Jin Huang received his B.S. and Ph.D. degrees in computational mathematics from University of Science and Technology of China (USTC), Hefei, in 1999 and 2005, respectively. He is currently an associate professor with the School of Computer Science and Technology, and the School of Data Science, USTC, Hefei. His current research interests include computer graphics, computer vision, machine learning and deep learning.
  • Supported by:
    This work was supported in part by the National Key Research and Development Program of China under Grant No. 2018YFC1504104, the Fundamental Research Funds for the Central Universities of China under Grant No. WK6030000109, and the National Natural Science Foundation of China under Grant No. 61877056.

In order to conduct optical neurophysiology experiments on a freely swimming zebrafish, it is essential to quantify the zebrafish head to determine exact lighting positions. To efficiently quantify a zebrafish head's behaviors with limited resources, we propose a real-time multi-stage architecture based on convolutional neural networks for pose estimation of the zebrafish head on CPUs. Each stage is implemented with a small neural network. Specifically, a light-weight object detector named Micro-YOLO is used to detect a coarse region of the zebrafish head in the first stage. In the second stage, a tiny bounding box refinement network is devised to produce a high-quality bounding box around the zebrafish head. Finally, a small pose estimation network named tiny-hourglass is designed to detect keypoints in the zebrafish head. The experimental results show that using Micro-YOLO combined with RegressNet to predict the zebrafish head region is not only more accurate but also much faster than Faster R-CNN which is the representative of two-stage detectors. Compared with DeepLabCut, a state-of-the-art method to estimate poses for user-defined body parts, our multi-stage architecture can achieve a higher accuracy, and runs 19x faster than it on CPUs.

Key words: convolutional neural network; pose estimation; real-time; zebrafish;

[1] Cong L, Wang Z, Chai Y, Han W, Shang C, Yang W, Bai L, Du J, Wang K, Wen Q. Rapid whole brain imaging of neural activity in freely behaving larval zebrafish (Danio rerio). Elife, 2017, 6:Article No. e28158. DOI:10.7554/elife.28158.
[2] Xu Z P, Cheng X E. Zebrafish tracking using convolutional neural networks. Scientific Reports, 2017, 7:Article No. 42815. DOI:10.1038/srep42815.
[3] Mathis A, Mamidanna P, Cury K M, Abe T, Murthy V N, Mathis M W, Bethge M. DeepLabCut:Markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 2018, 21:1281-1289. DOI:10.1038/s41593-018-0209-y.
[4] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.580-587. DOI:10.1109/CVPR.2014.81.
[5] Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. DOI:10.1109/ICCV.2015.169.
[6] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In Proc. the 29th Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99.
[7] Dai J, Li Y, He K, Sun J. R-FCN:Object detection via region-based fully convolutional networks. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2016, pp.379-387.
[8] Uijlings J R, van de Sande K E, Gevers T, Smeulders A W. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2):154-171. DOI:10.1007/s11263-013-0620-5.
[9] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once:Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.779-788. DOI:10.1109/CVPR.2016.91.
[10] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD:Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. DOI:10.1007/978-3-319-46448-02.
[11] Cai Z, Vasconcelos N. Cascade R-CNN:Delving into high quality object detection. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6154-6162. DOI:10.1109/CVPR.2018.00644.
[12] Toshev A, Szegedy C. DeepPose:Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660. DOI:10.1109/CVPR.2014.214.
[13] Pfister T, Simonyan K, Charles J, Zisserman A. Deep convolutional neural networks for efficient pose estimation in gesture videos. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.538-552. DOI:10.1007/978-3-319-16865-435.
[14] Carreira J, Agrawal P, Fragkiadaki K, Malik J. Human pose estimation with iterative error feedback. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4733-4742. DOI:10.1109/CVPR.2016.512.
[15] Pfister T, Charles J, Zisserman A. Flowing ConvNets for human pose estimation in videos. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1913-1921. DOI:10.1109/ICCV.2015.222.
[16] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732. DOI:10.1109/CVPR.2016.511.
[17] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.483-499. DOI:10.1007/978-3-319-46484-829.
[18] Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut:Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937. DOI:10.1109/CVPR.2016.533.
[19] Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B. Deepercut:A deeper, stronger, and faster multiperson pose estimation model. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.34-50. DOI:10.1007/978-3-319-46466-43.
[20] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multiperson 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310. DOI:10.1109/CVPR.2017.143.
[21] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI:10.1109/CVPR.2016.90.
[22] Li S, Fang Z, Song W, Hao A, Qin H. Bidirectional optimization coupled lightweight networks for efficient and robust multi-person 2D pose estimation. Journal of Computer Science and Technology, 2019, 34(3):522-536. DOI:10.1007/s11390-019-1924-x.
[1] Xiao-Long Yang, Xiao-Hong Jia, Yuan Liang, and Lu-Bin Fan. 6D Object Pose Estimation in Cluttered Scenes from RGB Images [J]. Journal of Computer Science and Technology, 2022, 37(3): 719-730.
[2] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[3] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network [J]. Journal of Computer Science and Technology, 2021, 36(4): 762-777.
[4] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. Is It Easy to Recognize Baby's Age and Gender? [J]. Journal of Computer Science and Technology, 2021, 36(3): 508-519.
[5] Qian-Qian Lin, Shu-Ling Wang, Bo-Hua Zhan, Bin Gu. Modelling and Verification of Real-Time Publish and Subscribe Protocol Using UPPAAL and Simulink/Stateflow [J]. Journal of Computer Science and Technology, 2020, 35(6): 1324-1342.
[6] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results [J]. Journal of Computer Science and Technology, 2020, 35(3): 493-505.
[7] Rui-Song Zhang, Wei-Ze Quan, Lu-Bin Fan, Li-Ming Hu, Dong-Ming Yan. Distinguishing Computer-Generated Images from Natural Images Using Channel and Pixel Correlation [J]. Journal of Computer Science and Technology, 2020, 35(3): 592-602.
[8] Shu-Quan Wang, Lei Wang, Yu Deng, Zhi-Jie Yang, Sha-Sha Guo, Zi-Yang Kang, Yu-Feng Guo, Wei-Xia Xu. SIES: A Novel Implementation of Spiking Convolutional Neural Network Inference Engine on Field-Programmable Gate Array [J]. Journal of Computer Science and Technology, 2020, 35(2): 475-489.
[9] Xing-Gang Wang, Jia-Si Wang, Peng Tang, Wen-Yu Liu. Weakly- and Semi-Supervised Fast Region-Based CNN for Object Detection [J]. Journal of Computer Science and Technology, 2019, 34(6): 1269-1278.
[10] Ze-Wei Chen, Hang Lei, Mao-Lin Yang, Yong Liao, Jia-Li Yu. Improved Task and Resource Partitioning Under the Resource-Oriented Partitioned Scheduling [J]. Journal of Computer Science and Technology, 2019, 34(4): 839-853.
[11] Robail Yasrab. SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities [J]. Journal of Computer Science and Technology, 2019, 34(4): 924-938.
[12] Jun-Hua Fang, Peng-Peng Zhao, An Liu, Zhi-Xu Li, Lei Zhao. Scalable and Adaptive Joins for Trajectory Data in Distributed Stream System [J]. Journal of Computer Science and Technology, 2019, 34(4): 747-761.
[13] Shuai Li, Zheng Fang, Wen-Feng Song, Ai-Min Hao, Hong Qin. Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation [J]. Journal of Computer Science and Technology, 2019, 34(3): 522-536.
[14] Ri-Sheng Liu, Cai-Sheng Mao, Zhi-Hui Wang, Hao-Jie Li. Blind Image Deblurring via Adaptive Optimization with Flexible Sparse Structure Control [J]. Journal of Computer Science and Technology, 2019, 34(3): 609-621.
[15] Dong-Di Zhao, Fan Li, Kashif Sharif, Guang-Min Xia, Yu Wang. Space Efficient Quantization for Deep Convolutional Neural Networks [J]. Journal of Computer Science and Technology, 2019, 34(2): 305-317.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[2] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[3] Pan Qijing;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[4] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[7] Shi Zhongzhi;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[8] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] Xia Peisu; Fang Xinwo; Wang Yuxiang; Yan Kaiming; Zhang Tingjun; Liu Yulan; Zhao Chunying; Sun Jizhong;. Design of Array Processor Systems[J]. , 1987, 2(3): 163 -173 .
[10] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved