We use cookies to improve your experience with our site.
Lan-Fang Dong, Han-Chao Liu, Xin-Ming Zhang. Synthetic Data Generation and Shuffled Multi-Round Training Based Offline Handwritten Mathematical Expression Recognition[J]. Journal of Computer Science and Technology, 2022, 37(6): 1427-1443. DOI: 10.1007/s11390-021-0722-4
Citation: Lan-Fang Dong, Han-Chao Liu, Xin-Ming Zhang. Synthetic Data Generation and Shuffled Multi-Round Training Based Offline Handwritten Mathematical Expression Recognition[J]. Journal of Computer Science and Technology, 2022, 37(6): 1427-1443. DOI: 10.1007/s11390-021-0722-4

Synthetic Data Generation and Shuffled Multi-Round Training Based Offline Handwritten Mathematical Expression Recognition

  • Offline handwritten mathematical expression recognition is a challenging optical character recognition (OCR) task due to various ambiguities of handwritten symbols and complicated two-dimensional structures. Recent work in this area usually constructs deeper and deeper neural networks trained with end-to-end approaches to improve the performance. However, the higher the complexity of the network, the more the computing resources and time required. To improve the performance without more computing requirements, we concentrate on the training data and the training strategy in this paper. We propose a data augmentation method which can generate synthetic samples with new LaTeX notations by only using the official training data of CROHME. Moreover, we propose a novel training strategy called Shuffled Multi-Round Training (SMRT) to regularize the model. With the generated data and the shuffled multi-round training strategy, we achieve the state-of-the-art result in expression accuracy, i.e., 59.74% and 61.57% on CROHME 2014 and 2016, respectively, by using attention-based encoder-decoder models for offline handwritten mathematical expression recognition.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return