Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (3): 509-521.doi: 10.1007/s11390-019-1923-y

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

Previous Articles     Next Articles

A Large Chinese Text Dataset in the Wild

Tai-Ling Yuan1, Zhe Zhu2, Kun Xu1, Member, CCF, IEEE, Cheng-Jun Li3, Tai-Jiang Mu1,*, Member, CCF, Shi-Min Hu1, Fellow, CCF, Senior Member, IEEE   

  1. 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    2 Department of Radiology, Duke University, North Carolina 27708, U.S.A.;
    3 Tencent Technology (Beijing) Co. Ltd., Beijing 100080, China
  • Received:2018-12-24 Revised:2019-03-20 Online:2019-05-05 Published:2019-05-06
  • Contact: Tai-Jiang Mu E-mail:mmmutj@gmail.com
  • About author:Tai-Ling Yuan is a Ph.D. candidate in the Department of Computer Science and Technology, Tsinghua University, Beijing. He received his B.S. degree in computer science and technology from the same university in 2016. His research interests include computer graphics and computer vision.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61822204 and 61521002, a research grant from the Beijing Higher Institution Engineering Research Center, and the Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.

In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3 850 unique ones annotated by experts in over 30 000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. For each character, the annotation includes its underlying character, bounding box, and six attributes. The attributes indicate the character's background complexity, appearance, style, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks:character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.

Key words: Chinese text dataset; Chinese text detection; Chinese text recognition;

[1] Cui Y, Zhou F, Lin Y, Belongie S. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1153-1162.
[2] Deng J, Dong W, Socher R, Li L J, Li K, L F F. ImageNet:A large-scale hierarchical image database. In Proc. the 22nd IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255.
[3] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO:Common objects in context. In Proc. the 13th European Conference on Computer Vision, April 2014, pp.740-755.
[4] Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 2019, 127(3):302-321.
[5] Lucas S M, Panaretos A, Sosa L et al. ICDAR 2003 robust reading competitions:Entries, results, and future directions. International Journal on Document Analysis and Recognition, 2005, 7(2/3):105-122.
[6] Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 127.
[7] Smith R, Gu C, Lee D, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S. End-to-end interpretation of the French Street Name Signs dataset. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.411- 426.
[8] Veit A, Matera T, Neumann L, Matas J, Belongie S. COCO-text:Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140, 2016. https://arxiv.org/abs/1601.07140, March 2019.
[9] de Campos T E, Babu B R, Varma M. Character recognition in natural images. In Proc. the 4th International Conference on Computer Vision Theory and Applications, February 2009, pp.273-280.
[10] Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227, 2014. https://arxiv.org/abs/1406.2227, March 2019.
[11] Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. https://ai.google/research/pubs/pub37648, March 2019.
[12] Wang K, Babenko B, Belongie S J. End-to-end scene text recognition. In Proc. the 2011 International Conference on Computer Vision, November 2011, pp.1457-1464.
[13] Jung J, Lee S, Cho M S, Kim J H. Touch TT:Scene text extractor using touchscreen interface. Journal of Electronics and Telecommunications Research Institute, 2011, 33(1):78-88.
[14] Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090.
[15] Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X. ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In Proc. the 14th IAPR International Conference on Document Analysis and Recognition, November 2017, pp.1429-1434.
[16] Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 23rd IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970.
[17] Matas J, Chum O, Urban M, Pajdla T. Robust widebaseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10):761-767.
[18] Chen H, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edgeenhanced Maximally Stable Extremal Regions. In Proc. the 18th IEEE International Conference on Image Processing, September 2011, pp.2609-2612.
[19] Koo H I, Kim D H. Scene text detection via connected component clustering and nontext filtering. IEEE Transactions Image Processing, 2013, 22(6):2296-2305.
[20] Neumann L, Matas J. A method for text localization and recognition in real-world images. In Proc. the 10th Asian Conference on Computer Vision, November 2011, pp.770- 783.
[21] Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X. Multioriented text detection with fully convolutional networks. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167.
[22] Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST:An efficient and accurate scene text detector. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651.
[23] He T, Huang W, Qiao Y, Yao J. Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423, 2016. https://arxiv.org/abs/1603.09423, March 2019.
[24] Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72.
[25] Sheshadri K, Divvala S K. Exemplar driven character recognition in the wild. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 13.
[26] Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z. Scene text recognition using part-based tree-structured character detection. In Proc. the 26th IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.2961-2968.
[27] Zhang D, Chang S F. A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Proc. the 2003 IEEE Conference on Computer Vision and Pattern Recognition, June 2003, pp.528-533.
[28] Mishra A, Alahari K, Jawahar C V. Top-down and bottomup cues for scene text recognition. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.2687-2694.
[29] Lee S, Kim J. Complementary combination of holistic and component analysis for recognition of low-resolution video character images. Pattern Recognition Letters, 2008, 29(4):383-391.
[30] Wang T, Wu D J, Coates A, Ng A Y. End-to-end text recognition with convolutional neural networks. In Proc. the 21st International Conference on Pattern Recognition, November 2012, pp.3304-3308.
[31] Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298- 2304.
[32] Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes:A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.
[33] Ye Q, Doermann D. Text detection and recognition in imagery:A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7):1480-1500.
[34] Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S. Trafficsign detection and classification in the wild. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2110-2118.
[35] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.
[36] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. OverFeat:Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, March 2019.
[37] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 28th IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.
[38] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.
[39] Everingham M, Eslami S A, Van Gool L, Williams C K, Winn J, Zisserman A. The PASCAL Visual Object Classes challenge:A retrospective. International Journal of Computer Vision, 2015, 111(1):98-136.
[40] Redmon J, Farhadi A. YOLO9000:Better, faster, stronger. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525.
[41] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD:Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved