Journal of Computer Science and Technology

   

Multimodal Dependence Attention and Large-Scale Data based Offline Handwritten Formula Recognition

Han-Chao Liu (刘汉超), Lan-Fang Dong (董兰芳), Member, CCF, and Xin-Ming Zhang (张信明), Senior Member, IEEE, CCF   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230022, China
  • Contact: Xin-Ming Zhang E-mail:xinming@ustc.edu.cn
  • About author:
    Xin-Ming Zhang received his B.E. and M.E. degrees in electrical engineering from China University of Mining and Technology, Xuzhou, in 1985 and 1988, respectively, and his Ph.D degree in computer science and technology from the University of Science and Technology of China, Hefei, in 2001. Since 2002, he has been with the faculty of the University of Science and Technology of China, where he is currently a professor with the School of Computer Science and Technology. From September 2005 to August 2006, he was a visiting professor with the Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Daejeon, Korea. His research interest includes wireless networks, big data, smart grid. He has published more than 100 papers. He won the second prize of Science and Technology Award of Anhui Province of China in Natural Sciences in 2017. He is a senior member of CCF and IEEE.

Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures. Recently, the deep neural network recognizers that are based on the encoder-decoder framework achieve great improvements on this task. However, the unsatisfactory recognition performance for formulas with long LaTeX strings is one shortcoming of the existing work. Moreover, lacking sufficient training data also limits the capability of these recognizers. In this paper, we design a multimodal dependence attention (MDA) module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition performance of the formulas with long LaTeX strings. To alleviate overfitting and further improve the recognition performance, we also propose a new dataset, Handwritten Formula Image Dataset (HFID), which contains 25620 handwritten formula images collected from real life. We conducted extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieved state-of-the-art performances, 63.79% and 65.24% expression accuracy on CROHME 2014 and 2016, respectively.


中文摘要

1、 研究背景(context)
随着社会信息化的发展,人们越来越多的使用计算机处理日常工作和学习上的任务。公式作为一种表达、抽象和定义问题的工具,我们的日常学习和生活中有着广泛的应用,然而由于其复杂的二维结构,导致在计算机中输入公式十分复杂且耗时。虽然手写是人类最自然的信息记录方式,但是手写输入的信息计算机却很难理解。离线手写公式识别的目的就是将人们手写的公式图像转换为计算机可以编辑和理解的格式(如LaTeX字符串)的过程。由于手写字符的随意性以及公式本身复杂的二维结构,离线手写公式识别长久以来是一项极具挑战性的任务。随着近些年深度学习的发展,基于注意力机制的编解码网络极大地推动了离线手写公式识别领域发展,并提高了该领域的识别效果。然而目前的研究工作对于相对简单的公式识别效果较好,而对于具有较长LaTeX字符串标签的复杂公式识别效果相对较差,对于长序复杂公式识别的优化研究暂时也比较少。此外,为了提高识别效果,研究人员设计了越来越精妙而复杂的模型结构,然而现有的训练数据相对较少,往往难以支撑复杂模型的正确训练,模型过拟合逐渐成为了制约该领域发展的瓶颈。
2、 目的(Objective)
我们的工作首先通过构建大型手写公式图像数据集来增强训练数据,降低模型过拟合,提高离线手写公式识别的效果。此外,我们还通过针对长序复杂公式图像的识别优化,达到提高模型可用性,进一步提高公式识别效果的目的。
3、 方法(Method)
我们构建了一个基于真实场景的手写公式图像数据集HFID,该数据集涵盖了156类常用公式字符,共包含26520张数学、物理和化学领域中的手写公式图像,数据量约为目前本领域中最常用的CROHME (Competition on Recognition of Online Handwritten Mathematical Expressions) 数据集数据量的两倍。此外,我们还设计了一种基于字符多模态关系依赖注意力模块(Multimodal Dependence Attention, MDA),通过该模块抽取公式中字符的多模态特征来表征字符,并以字符多模态特征为输入,利用注意力机制建模公式中字符间的依赖关系,并以该关系辅助公式中字符的识别,提高模型的识别效果。
4、 结果(Result & Findings)
我们在CROHME数据集和HFID数据集中进行了实验。在使用HFID训练集做预训练,使用CROHME训练集进行微调的模型相比未经HFID预训练的模型在CROHME 2014、CROHME 2016和CROHME 2019数据集的识别结果分别由47.70%、50.83%和51.29%提升到58.62%、60.35%和57.80%。在加入MDA模块后,模型在CROHME 2014、CROHME 2016和CROHME 2019数据集中的结果分别提升到59.94%、62.70%和59.38%,在HFID测试集中的结果则由59.12%提升至60.16%。此外,我们对MDA生成的权重图进行了可视化分析,验证了MDA确实能够学到字符的关系依赖。我们还对在不同长度区间的公式识别结果进行了统计,实验结果表明,加入MDA模块后,模型对长序复杂公式的识别效果确实有所提升。最后,在多模型联合的情况下,我们在CROHME 2014和CROHME 2016数据集中分别达到了63.79%和65.24%,是目前在这两个数据集中的最佳识别结果。
5、 结论(Conclusions)
实验结果表明,本文构建的HFID数据集能够有效的降低模型过拟合影响,进一步提高模型的识别效果。而通过MDA模块学习到的字符依赖关系,确实能够有效提升长序复杂公式的识别效果,并进一步提高模型在离线手写公式识别问题中的表现。在未来的工作中,我们将研究如何将Transformer这一强大的编解码网络应用到离线手写公式识别问题中,以进一步提高模型的识别效果。


Key words: attention; dataset; handwritten formula recognition; multimodal; semantic; visual;

[1] Inès Mouakher, Fatma Dhaou, and J. Christian Attiogbé. Event-Based Semantics of UML 2.X Concurrent Sequence Diagrams for Formal Verification [J]. Journal of Computer Science and Technology, 2022, 37(1): 4-28.
[2] Li-Li Xiao, Hui-Biao Zhu, Qi-Wen Xu. Trace Semantics and Algebraic Laws for Total Store Order Memory Model [J]. Journal of Computer Science and Technology, 2021, 36(6): 1269-1290.
[3] Xiao-Li Ren, Kai-Jun Ren, Zi-Chen Xu, Xiao-Yong Li, Ao-Long Zhou, Jun-Qiang Song, Ke-Feng Deng. Improving Ocean Data Services with Semantics and Quick Index [J]. Journal of Computer Science and Technology, 2021, 36(5): 963-984.
[4] Jia-Ke Ge, Yan-Feng Chai, Yun-Peng Chai. WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning [J]. Journal of Computer Science and Technology, 2021, 36(4): 741-761.
[5] Chen-Chen Sun, De-Rong Shen. Mixed Hierarchical Networks for Deep Entity Matching [J]. Journal of Computer Science and Technology, 2021, 36(4): 822-838.
[6] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. Is It Easy to Recognize Baby's Age and Gender? [J]. Journal of Computer Science and Technology, 2021, 36(3): 508-519.
[7] Hui-Xuan Wang, Jing-Liang Peng, Shi-Yi Lu, Xin Cao, Xue-Ying Qin, Chang-He Tu. ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis [J]. Journal of Computer Science and Technology, 2021, 36(3): 494-507.
[8] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning [J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663.
[9] Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology [J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360.
[10] Xia-An Bi, Zhao-Xu Xing, Rui-Hui Xu, Xi Hu. An Efficient WRF Framework for Discovering Risk Genes and Abnormal Brain Regions in Parkinson's Disease Based on Imaging Genetics Data [J]. Journal of Computer Science and Technology, 2021, 36(2): 361-374.
[11] Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiang-Ping Chen, Zi-Bin Zheng, Ming-Dong Tang. Learning Human-Written Commit Messages to Document Code Changes [J]. Journal of Computer Science and Technology, 2020, 35(6): 1258-1277.
[12] Yi-Ting Wang, Jie Shen, Zhi-Xu Li, Qiang Yang, An Liu, Peng-Peng Zhao, Jia-Jie Xu, Lei Zhao, Xun-Jie Yang. Enriching Context Information for Entity Linking with Web Data [J]. Journal of Computer Science and Technology, 2020, 35(4): 724-738.
[13] Ying Li, Jia-Jie Xu, Peng-Peng Zhao, Jun-Hua Fang, Wei Chen, Lei Zhao. ATLRec: An Attentional Adversarial Transfer Learning Network for Cross-Domain Recommendation [J]. Journal of Computer Science and Technology, 2020, 35(4): 794-808.
[14] Huan-Jing Yue, Sheng Shen, Jing-Yu Yang, Hao-Feng Hu, Yan-Fang Chen. Reference Image Guided Super-Resolution via Progressive Channel Attention Networks [J]. Journal of Computer Science and Technology, 2020, 35(3): 551-563.
[15] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results [J]. Journal of Computer Science and Technology, 2020, 35(3): 493-505.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved