用于多风格汉字骨架化的新颖三阶段模型

田业川; 徐颂华

doi:10.1007/s11390-023-1337-8

用于多风格汉字骨架化的新颖三阶段模型

田业川,
徐颂华,

A Novel Three-Staged Generative Model for Skeletonizing Chinese Characters with Versatile Styles

摘要

摘要:
研究背景 字符骨架为大量相关任务提供了有价值的信息，例如光学字符识别、图像重建与分割、风格学习与迁移和基于手写分析的身份认证和验证。但是汉字数量庞大，在数千年的发展历史中和多样的应用场景下形成了种类繁多的书写风格，不同风格下汉字的形态大相径庭。这为自动提取汉字骨架带来了巨大的挑战。深度学习时代之前，基于传统图像分析的骨架提取算法，主要为细化算法（thinning algorithm），难以有效处理笔画的交叉处，边缘不光滑处，和宽度快速变化处。现在仅有的基于深度学习的骨架提取算法，使用预训练的汉字识别网络提取特征，需要使用大量训练样本，这严重限制了算法面对多种风格的可拓展性。
目的为解决上述挑战，本文引入了图像到图像框架下的三阶段生成模型，可以有效提取多种风格汉字的骨架。其中，图像到图像框架，相比于使用预训练网络，有效减少了对训练数据数量的需求。
方法新模型的三个阶段分别由改进的U-net，X-net和新提出的F-net组成。三个网络依顺序分别负责模型的一个阶段，逐步生成高质量的汉字骨架。这种设计将初步生成结果作为下一步生成的依据，有效缓解了骨架特有的单像素宽度对于生成造成的困难。
结果实验结果证明三阶段模型的骨架提取结果在多个指标下均明显优于单阶段模型，例如最优F-measure在三个数据集上分别由0.726，0.891，0.498提升到0.777，0.925，0.529，同时也优于其他对比模型。在减小楷体数据集训练数据的数量（从7000个训练对到40个）后，提出的模型仍能提取视觉上可以接受的结果，而其他模型表现出更为显著的质量损失。最后，将提取的手写字骨架应用于手写识别任务，达到了94.6%的top-1识别精度和99.8%的top-5识别精度, 与真实数据的96.6%的top-1识别精度和99.8%的top-5识别精度相当接近，远好于对比方法。
结论上述结果说明提出的模型在提取骨架的质量，对于数据数量的依赖性上具有明显优势。模型提取的高质量骨架在被用于下游任务时也有效提升了程序的整体表现。考虑到字符骨架在汉字相关任务中被广泛使用，本文的结果可被用作预处理方式提取骨架，替代现有的骨架提取算法，提升下游任务性能，具有广阔的应用前景。

Abstract: Skeletons of characters provide vital information to support a variety of tasks, e.g., optical character recognition, image restoration, stroke segmentation and extraction, and style learning and transfer. However, automatically skeletonizing Chinese characters poses a steep computational challenge due to the large volume of Chinese characters and their versatile styles, for which traditional image analysis approaches are error-prone and fragile. Current deep learning based approach requires a heavy amount of manual labeling efforts, which imposes serious limitations on the precision, robustness, scalability and generalizability of an algorithm to solve a specific problem. To tackle the above challenge, this paper introduces a novel three-staged deep generative model developed as an image-to-image translation approach, which significantly reduces the model's demand for labeled training samples. The new model is built upon an improved G-net, an enhanced X-net, and a newly proposed F-net. As compellingly demonstrated by comprehensive experimental results, the new model is able to iteratively extract skeletons of Chinese characters in versatile styles with a high quality, which noticeably outperforms two state-of-the-art peer deep learning methods and a classical thinning algorithm in terms of F-measure, Hausdorff distance, and average Hausdorff distance.

HTML全文

参考文献()

施引文献

资源附件()