SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Liu HC, Dong LF, Zhang XM. Multimodal dependence attention and large-scale data based offline handwritten formula recognition. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(3): 654−670 May 2024. DOI: 10.1007/s11390-022-1987-y. |
Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures. Recently, the deep neural network recognizers based on the encoder-decoder framework have achieved great improvements on this task. However, the unsatisfactory recognition performance for formulas with long \LaTeX strings is one shortcoming of the existing work. Moreover, lacking sufficient training data also limits the capability of these recognizers. In this paper, we design a multimodal dependence attention (MDA) module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition performance of the formulas with long \LaTeX strings. To alleviate overfitting and further improve the recognition performance, we also propose a new dataset, Handwritten Formula Image Dataset (HFID), which contains 25620 handwritten formula images collected from real life. We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances, 63.79% and 65.24% expression accuracy on CROHME 2014 and CROHME 2016, respectively.
[1] |
Zhang J S, Du J, Zhang S L, Liu D, Hu Y L, Hu J S, Wei S, Dai L R. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, 2017, 71: 196–206. DOI: 10.1016/j.patcog.2017.06.017.
|
[2] |
Wu J W, Yin F, Zhang Y M, Zhang X Y, Liu C L. Image-to-markup generation via paired adversarial learning. In Proc. the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Sept. 2018, pp.18–34. DOI: 10.1007/978-3-030-10925-7_2.
|
[3] |
Wu J W, Yin F, Zhang Y M, Zhang X Y, Liu C L. Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vision, 2020, 128(10): 2386–2401. DOI: 10.1007/s11263-020-01291-5.
|
[4] |
Anderson R H. Syntax-directed recognition of hand-printed two-dimensional mathematics. In Proc. the Association for Computing Machinery Inc. Symposium, Aug. 1967, pp.436–459. DOI: 10.1145/2402536.2402585.
|
[5] |
Hu L, Zanibbi R. Segmenting handwritten math symbols using AdaBoost and multi-scale shape context features. In Proc. the 12th International Conference on Document Analysis and Recognition, Aug. 2013, pp.1180–1184. DOI: 10.1109/ICDAR.2013.239.
|
[6] |
Álvaro F, Sánchez J A, Benedí J M. Offline features for classifying handwritten math symbols with recurrent neural networks. In Proc. the 22nd International Conference on Pattern Recognition, Aug. 2014, pp.2944–2949. DOI: 10.1109/ICPR.2014.507.
|
[7] |
Awal A M, Mouchère H, Viard-Gaudin C. A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognit. Lett., 2014, 35: 68–77. DOI: 10.1016/j.patrec.2012.10.024.
|
[8] |
Álvaro F, Sánchez J A, Benedí J M. An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit., 2016, 51: 135–147. DOI: 10.1016/j.patcog.2015.09.013.
|
[9] |
Deng Y T, Kanervisto A, Ling J, Rush A M. Image-to-markup generation with coarse-to-fine attention. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.980–989.
|
[10] |
Zhang J S, Du J, Dai L R. Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In Proc. the 24th International Conference on Pattern Recognition, Aug. 2018, pp.2245–2250. DOI: 10.1109/ICPR.2018.8546031.
|
[11] |
Le A D, Indurkhya B, Nakagawa M. Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett., 2019, 128: 255–262. DOI: 10.1016/j.patrec.2019.09.002.
|
[12] |
Li Z, Jin L W, Lai S X, Zhu Y C. Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In Proc. the 17th International Conference on Frontiers in Handwriting Recognition, Sept. 2020, pp.175–180. DOI: 10.1109/ICFHR2020.2020.00041.
|
[13] |
Zhang J S, Du J, Yang Y X, Song Y Z, Wei S, Dai L R. A tree-structured decoder for image-to-markup generation. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 1027.
|
[14] |
Xu K, Ba J L, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R S, Bengio Y. Show, attend and tell: Neural image caption generation with visual attention. In Proc. the 32nd International Conference on International Conference on Machine Learning, Jul. 2015, pp.2048–2057.
|
[15] |
Mouchère H, Zanibbi R, Garain U, Viard-Gaudin C. Advancing the state of the art for handwritten math recognition: The CROHME competitions, 2011-2014. Int. J. Document Anal. Recognit., 2016, 19(2): 173–189. DOI: 10.1007/s10032-016-0263-5.
|
[16] |
Mouchère H, Viard-Gaudin C, Zanibbi R, Garain U. ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In Proc. the 15th International Conference on Frontiers in Handwriting Recognition, Oct. 2016, pp.607–612. DOI: 10.1109/ICFHR.2016.0116.
|
[17] |
Mahdavi M, Zanibbi R, Mouchere H, Viard-Gaudin C, Garain U. ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In Proc. the 2019 International Conference on Document Analysis and Recognition, Sept. 2019, pp.1533–1538. DOI: 10.1109/ICDAR.2019.00247.
|
[18] |
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput., 1997, 9(8): 1735–1780. DOI: 10.1162/neco. 1997.9.8.1735.
|
[19] |
Chung J, Gulcehre C, Cho K H, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. https://arxiv.org/abs/1412.3555, May 2024.
|
[20] |
Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y N. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.1243–1252.
|
[21] |
Tang G B, Müller M, Rios A, Sennrich R. Why self-attention? A targeted evaluation of neural machine translation architectures. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, Oct. 31–Nov. 4, 2018, pp.4263–4272. DOI: 10.18653/v1/D18-1458.
|
[22] |
Zhang J S, Du J, Dai L R. Track, Attend, and Parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia, 2019, 21(1): 221–233. DOI: 10.1109/TMM.2018.2844 689.
|
[23] |
Liu C, Yin F, Wang D, Wang Q. CASIA online and offline Chinese handwriting databases. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.37–41. DOI: 10.1109/ICDAR.2011.17.
|
[24] |
Marti U V, Bunke H. The IAM-database: An English sentence database for offline handwriting recognition. Int. J. Document Anal. Recognit., 2002, 5(1): 39–46. DOI: 10.1007/ s100320200071.
|
[25] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. https://arxiv.org/abs/1409.1556, May 2024.
|
[26] |
Gu J X, Wang G, Cai J F, Chen T. An empirical study of language CNN for image captioning. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1231–1240. DOI: 10.1109/ICCV.2017.138.
|
[27] |
Xiu Y H, Wang Q Q, Zhan H J, Lan M, Lu Y. A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In Proc. the 2019 International Conference on Document Analysis and Recognition, Sept. 2019, pp.1464–1469. DOI: 10.1109/ICDAR.2019.00235.
|
[28] |
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261–2269. DOI: 10.1109/CVPR.2017.243.
|
[29] |
Weston J, Chopra S, Bordes A. Memory networks. arXiv: 1410.3916, 2014. https://arxiv.org/abs/1410.3916, May 2024.
|
[30] |
Ranzato M A, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. arXiv: 1511.06732, 2015. https://arxiv.org/abs/1511.06732, May 2024.
|
[31] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.
|
[32] |
Zanibbi R, Mouchère H, Viard-Gaudin C. Evaluating structural pattern recognition for handwritten math via primitive label graphs. In Proc. the SPIE 8658, Document Recognition and Retrieval XX, Feb. 2013, Article No. 865817. DOI: 10.1117/12.2008409.
|
[33] |
Abadi M, Agarwal A, Barham P et al. Tensor-flow: Large-scale machine learning on heterogeneous distributed systems. arXiv: 1603.04467, 2016. https://arxiv.org/abs/1603.04467, May 2024.
|
[34] |
Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv: 1212.5701, 2012. https://arxiv.org/abs/1212.5701, May 2024.
|
[35] |
Krogh A, Hertz J A. A simple weight decay can improve generalization. In Proc. the 4th International Conference on Neural Information Processing Systems, Dec. 1991, pp.950–957.
|
[36] |
Cho K. Natural language understanding with distributed representation. arXiv: 1511.07916, 2015. https://arxiv.org/abs/1511.07916, May 2024.
|
[1] | Jiang-Nan Cui, Yang Gao, Qiu Wang, Xuan Li, Ke-Ren Xu, Zhen-Yu Huang, Jing-Song Zhang, Chun-Man Zuo. Advanced Cross-Graph Cycle Attention Model for Dissecting Complex Structures in Mass Spectrometry Imaging[J]. Journal of Computer Science and Technology, 2025, 40(3): 766-779. DOI: 10.1007/s11390-025-4342-2 |
[2] | Xue-Yang Qin, Li-Shuang Li, Jing-Yao Tang, Fei Hao, Mei-Ling Ge, Guang-Yao Pang. Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval[J]. Journal of Computer Science and Technology, 2024, 39(4): 811-826. DOI: 10.1007/s11390-024-4125-1 |
[3] | Di Wang, Jin-Shan Pan, Jin-Hui Tang. Single Image Deraining Using Residual Channel Attention Networks[J]. Journal of Computer Science and Technology, 2023, 38(2): 439-454. DOI: 10.1007/s11390-022-0979-2 |
[4] | Lan-Fang Dong, Han-Chao Liu, Xin-Ming Zhang. Synthetic Data Generation and Shuffled Multi-Round Training Based Offline Handwritten Mathematical Expression Recognition[J]. Journal of Computer Science and Technology, 2022, 37(6): 1427-1443. DOI: 10.1007/s11390-021-0722-4 |
[5] | Xiao-Yu Du, Yang Yang, Liu Yang, Fu-Min Shen, Zhi-Guang Qin, Jin-Hui Tang. Captioning Videos Using Large-Scale Image Corpus[J]. Journal of Computer Science and Technology, 2017, 32(3): 480-493. DOI: 10.1007/s11390-017-1738-7 |
[6] | Shu-Qiang Jiang, Jun Du, Qing-Ming Huang, Tie-Jun Huang, Wen Gao. Visual Ontology Construction for Digitized Art Image Retrieval[J]. Journal of Computer Science and Technology, 2005, 20(6): 855-860. |
[7] | XU Zhiming, WANG Xiaolong. A New Linguistic Decoding Method for Online Handwritten Chinese Character Recognition[J]. Journal of Computer Science and Technology, 2000, 15(6): 597-604. |
[8] | Cai Yong, Heng Phengann, Wu Enhua, Liu Xuehui, Li Hongju, Sun Qingjie. An Image-Based Virtual Reality Prototype System[J]. Journal of Computer Science and Technology, 1998, 13(5): 475-480. |
[9] | Wang Zhou, Yu Yinglin. Dynamic Fractal Transform with Applications to Image Data Compression[J]. Journal of Computer Science and Technology, 1997, 12(3): 202-209. |
[10] | Zheng Nanning, Liu Jianqin. Visual Knowledge Representation and Intelligent Image Segmentation[J]. Journal of Computer Science and Technology, 1992, 7(3): 219-225. |