AWGT-Benchmark: An Air-Writing Benchmark Supporting Cross-Scenario Text Recognition
-
Abstract
We introduce the Air-Writing General Text Benchmark, termed AWGT-Benchmark (AWGT), a video-based, multilingual, and multigranularity dataset designed to advance air-writing text recognition (AWTR) across diverse scenarios. Although AWGT bridges the gap between conventional text recognition and AWTR scenarios, it also poses new challenges for computer vision and natural language processing. AWGT comprises 226 648 video frames across four subsets in Chinese and English, with finger-writing trajectories captured using RGB cameras in real-world scenarios to ensure realism and diversity. Based on AWGT, we propose a two-stage recognition framework that first extracts finger motion trajectories and subsequently converts them into trajectory images for character recognition. This design effectively suppresses background noise and emphasizes structural character details. Experimental results on AWGT show that widely used text recognition models suffer substantial performance degradation when applied to air-writing tasks, highlighting their limited ability to handle dynamic finger movements and cluttered visual contexts. AWGT provides a unified evaluation protocol and experimental baseline, reveals key limitations of existing methods, and offers insights for developing more robust and adaptable cross-scenario recognition systems. Furthermore, it contributes to advancing research in intelligent human--computer interaction. The dataset will be made publicly available to facilitate future research.
-
-