Journal of Computer Science and Technology ›› 2018, Vol. 33 ›› Issue (5): 1086-1100.doi: 10.1007/s11390-018-1874-8

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Computer Graphics and Multimedia • Previous Articles    

Understanding and Generating Ultrasound Image Description

Xian-Hua Zeng, Member, CCF, ACM, Bang-Gui Liu, Meng Zhou   

  1. School of Computer Science and Technology, Chongqing University of Posts and Telecommunications Chongqing 400065, China
  • Received:2018-03-04 Revised:2018-07-10 Online:2018-09-17 Published:2018-09-17
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61672120, the Chongqing Natural Science Foundation Program of China under Grant No. cstc2015jcyjA40036, and Postgraduate Scientific Research and Innovation Project of Chongqing of China under Grant No. CYS17224.

To understand the content of ultrasound images more conveniently and more quickly, in this paper, we propose a coarse-to-fine ultrasound image captioning ensemble model, which can automatically generate the annotation text that is composed of relevant n-grams to describe the disease information in the ultrasound images. First, the organs in the ultrasound images are detected by the coarse classification model. Second, the ultrasound images are encoded by the corresponding fine-grained classification model according to the organ labels. Finally, we input the encoding vectors to the language generation model, and the language generation model generates automatically annotation text to describe the disease information in the ultrasound images. In our experiments, the encoding model can obtain the high accuracy rate in the ultrasound image recognition. And the language generation model can automatically generate high-quality annotation text. In practical applications, the coarse-to-fine ultrasound image captioning ensemble model can help patients and doctors obtain the well understanding of the contents of ultrasound images.

Key words: ultrasound image; fine-grained classification; image captioning;

[1] Chen H, Zheng Y, Park J H, Heng P A, Zhou S K. Iterative multi-domain regularized deep learning for anatomical structure detection and segmentation from ultrasound images. In Proc. MICCAI, Jul. 2016, pp.487-495.
[2] Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell:A neural image caption generator. In Proc. CVPR, Jun. 2015, pp.3156-3164.
[3] Chen H, Dou Q, Ni D, Cheng J Z, Qin J, Li S. Automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. In Proc. MICCAI, Oct. 2015, pp.507-514.
[4] Yu Z, Ni D, Chen S, Li S, Wang T, Lei B. Fetal facial standard plane recognition via very deep convolutional networks. In Proc. EMBC, Aug. 2016, pp.627-630.
[5] Cheng P M, Malhi H S. Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. Journal of Digital Imaging, 2017, 30(2):1-10.
[6] Liu X, Shi J, Zhang Q. Tumor classification by deep polynomial network and multiple kernel learning on small ultrasound image dataset. In Proc. MLMI, Oct. 2015, pp.313-320.
[7] Gao Y, Maraci M A, Noble J A. Describing ultrasound video content using deep convolutional neural networks, international symposium on biomedical imaging. In Proc. ISBI, Apr. 2016, pp.787-790.
[8] Milletari F, Ahmadi S A, Kroll C, Plate A, Rozanski V, Maiostre J. Hough-CNN:Deep learning for segmentation of deep brain regions in MRI and ultrasound. Computer Vision and Image Understanding, 2017, 164:92-102.
[9] Farhadi A, Hejrati M, Sadeghi M A, Young P, Rashtchian C, Hockenmaier J, Forsyth D. Every picture tells a story:Generating sentences from images. In Proc. ECCV, Sept. 2010, pp.15-29.
[10] Li S, Kulkarni G, Berg T L, Berg A C, Choi Y. Composing simple image descriptions using web-scale n-grams. In Proc. IWCS, Apr. 2011, pp.220-228.
[11] Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y. BabyTalk:Understanding and generating simple image descriptions. Pattern Analysis and Machine Intelligence, 2013, 35(12):2891-2903.
[12] Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A. Midge:Generating image descriptions from computer vision detections. In Proc. EACL, Apr. 2012, pp.747-756.
[13] Aker A, Gaizauskas R. Generating image descriptions using dependency relational patterns. In Proc. ACL, Jul. 2010, pp.1250-1258.
[14] Kuznetsova P, Ordonez V, Berg A C, Berg T L, Choi Y. Collective generation of natural image descriptions. In Proc. ACL, Jul. 2012, pp.359-368.
[15] Cho K, Merrienboer B V, Gulcehre C, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. EMNLP, Oct. 2014, pp.1724-1734.
[16] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014., Jan. 2018.
[17] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. NIPS, Dec. 2014, pp.3104-3112.
[18] Mao J, Xu W, Yang Y, Wang J, Yuille A L. Deep captioning with multimodal recurrent neural networks. arXiv:1412.6632, 2014., Jan. 2018.
[19] Kiros R, Salakhutdinov R, Zemel R. Multimodal neural language models. In Proc. ICML, Jun. 2014, pp.595-603.
[20] Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions. In Proc. CVPR, Jun. 2015, pp.3128-3137.
[21] Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R. Show, attend and tell:Neural image caption generation with visual attention. In Proc. ICML, Jul. 2015, pp.2048-2057.
[22] Liu C, Mao J, Sha F, Yuille A. Attention correctness in neural image captioning. In Proc. AAAI, Feb. 2017, pp.4176-4182.
[23] You Q, Jin H, Wang Z, Luo J. Image captioning with semantic attention. In Proc. CVPR, Jun. 2016, pp.4651-4659.
[24] Yang Z, Yuan Y, Wu Y, Salakhutdinov R, Cohen W W. Review networks for caption generation. In Proc. NIPS, Dec. 2016, pp.2369-2377.
[25] Wu Q, Shen C, Liu L, Hengel A V D. What value do explicit high level concepts have in vision to language problems. In Proc. CVPR, Jun. 2016, pp.203-212.
[26] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. NIPS, Dec. 2012, pp.1097-1105.
[27] Simonyan A, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014., Jan. 2018.
[28] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D. Going deeper with convolutions. In Proc. CVPR, Jun. 2015.
[29] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. CVPR, Jun. 2016, pp.770-778.
[30] Toshev A, Szegedy C. DeepPose:Human pose estimation via deep neural networks. In Proc. CVPR, Jun. 2014, pp.1653-1660.
[31] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. CVPR, Jun. 2015, pp.3431-3440.
[32] Wan J, Wang D, Hoi S C H, Wu P, Zhu J, Zhang Y. Deep learning for content-based image retrieval:A comprehensive study. In Proc. ACM Multimedia, Nov. 2014, pp.157-166.
[33] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[34] Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks. In Proc. NIPS, Dec. 2014, pp.3320-3328.
[35] Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet:A large-scale hierarchical image database. In Proc. CVPR, Jun. 2009, pp. 248-255.
[36] Graves A. Long Short-Term Memory. Berlin, Heidelberg:The Springer Press, 2012.
[37] Jia Y, Shelhamer E, Donahue J et al. Caffe:Convolutional architecture for fast feature embedding. In Proc. ACM Multimedia, Nov. 2014, pp.675-678.
[38] Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In Proc. ICML, Jun. 2010, pp.807-814.
[39] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. Computer Science, 2012, 3(4):212-223.
[40] Papineni K, Roukos S, Ward T, Zhu W J. BLEU:A method for automatic evaluation of machine translation. Wireless Networks, 2002, 4(4):307-318.
[41] Lin C Y. ROUGE:A package for automatic evaluation of summaries. In Proc. WAS, Jan. 2004, pp.155-156.
[42] Lavie A, Agarwal A. Meteor:An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proc. ACL, Jun. 2007, pp.228-231.
[43] Vedantam R, Zitnick C L, Parikh D. CIDEr:Consensusbased image description evaluation. In Proc. CVPR, June 2015, pp.4566-4575.
[44] Chen X, Fang H, Lin T Y, Vedantam R, Gupta S, Dollar P. Microsoft COCO captions:Data collection and evaluation server. arXiv:1504.00325, 2015., Jan. 2018.
[1] Qian Wang, You-Dong Ding. A Novel Fine-Grained Method for Vehicle Type Recognition Based on the Locally Enhanced PCANet Neural Network [J]. , 2018, 33(2): 335-350.
[2] Xiao-Yu Du, Yang Yang, Liu Yang, Fu-Min Shen, Zhi-Guang Qin, Jin-Hui Tang. Captioning Videos Using Large-Scale Image Corpus [J]. , 2017, 32(3): 480-493.
Full text



[1] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[2] Lu Xuemiao;. On the Complexity of Induction of Structural Descriptions[J]. , 1987, 2(1): 12 -21 .
[3] Zhang Bo; Zhang Ling;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[4] Meng Liming; Xu Xiaofei; Chang Huiyou; Chen Guangxi; Hu Mingzeng; Li Sheng;. A Tree-Structured Database Machine for Large Relational Database Systems[J]. , 1987, 2(4): 265 -275 .
[5] Lin Qi; Xia Peisu;. The Design and Implementation of a Very Fast Experimental Pipelining Computer[J]. , 1988, 3(1): 1 -6 .
[6] Sun Chengzheng; Tzu Yungui;. A New Method for Describing the AND-OR-Parallel Execution of Logic Programs[J]. , 1988, 3(2): 102 -112 .
[7] Zhang Bo; Zhang Tian; Zhang Jianwei; Zhang Ling;. Motion Planning for Robots with Topological Dimension Reduction Method[J]. , 1990, 5(1): 1 -16 .
[8] Zhu Mingyuan;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[9] Yang Hongqing;. A Characterization of Achievable Patterns of the MN-Puzzle Problem[J]. , 1990, 5(3): 266 -274 .
[10] Wang Dingxing; Zheng Weimin; Du Xiaoli; Guo Yike;. On the Execution Mechanisms of Parallel Graph Reduction[J]. , 1990, 5(4): 333 -346 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved