Multi-Scale Adaptive Large Kernel Graph Convolutional Network for Skeleton-Based Action Recognition

Yu-Qing Zhang; Chen Pang; Pei Geng; Xue-Quan Lu; Lei Lyu

doi:10.1007/s11390-025-5287-1

Zhang YQ, Pang C, Geng P et al. Multi-scale adaptive large kernel graph convolutional network for skeleton-based action recognition. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY, 40(5): 1285−1300, Sept. 2025. DOI: 10.1007/s11390-025-5287-1

Citation:

Multi-Scale Adaptive Large Kernel Graph Convolutional Network for Skeleton-Based Action Recognition

Abstract

Abstract

Graph convolutional networks (GCNs) have become a dominant approach for skeleton-based action recognition tasks. Although GCNs have made significant progress in modeling skeletons as spatial-temporal graphs, they often require stacking multiple graph convolution layers to effectively capture long-distance relationships among nodes. This stacking not only increases computational burdens but also raises the risk of over-smoothing, which can lead to the neglect of crucial local action features. To address this issue, we propose a novel multi-scale adaptive large kernel graph convolutional network (MSLK-GCN) to effectively aggregate local and global spatio-temporal correlations while maintaining the computational efficiency. The core components of the network include two multi-scale large kernel graph convolution (LKGC) modules, a multi-channel adaptive graph convolution (MAGC) module, and a multi-scale temporal self-attention convolution (MSTC) module. The LKGC module adaptively focuses on active motion regions by utilizing a large convolution kernel and a gating mechanism, effectively capturing long-distance dependencies within the skeleton sequence. Meanwhile, the MAGC module dynamically learns relationships between different joints by adjusting connection weights between nodes. To further enhance the ability to capture temporal dynamics, the MSTC module effectively aggregates the temporal information by integrating Efficient Channel Attention (ECA) with multi-scale convolution. In addition, we use a multi-stream fusion strategy to make full use of different modal skeleton data, including bone, joint, joint motion, and bone motion. Exhaustive experiments on three scale-varying datasets, i.e., NTU-60, NTU-120, and NW-UCLA, demonstrate that our MSLK-GCN can achieve state-of-the-art performance with fewer parameters.

FullText(HTML)

References (36)

Relative Articles

Supplements (4)

Cited By

Multi-Scale Adaptive Large Kernel Graph Convolutional Network for Skeleton-Based Action Recognition

Abstract

Catalog

Export File

Citation

Format

Content