Multi-scale Adaptive Large Kernel Graph Convolutional Network for Skeleton-based Recognition
-
Abstract
Graph convolutional networks (GCNs) have become a dominant approach for skeleton-based action recognition task. Although GCNs have made significant progress in modeling skeletons as spatial-temporal graphs, they often require the stacking of multiple graph convolution layers to effectively capture long-distance relationships among nodes. This stacking not only increases computational burdens but also raises the risk of over-smoothing, which can lead to the neglect of crucial local action features. To address this issue, we propose a novel multi-scale adaptive large kernel graph convolutional network (MSLK-GCN) to effectively aggregate local and global spatio-temporal correlations while maintaining the computational efficiency. The core components of the network include multi-scale large kernel graph convolution (LKGC) module, multi-channel adaptive graph convolution (MAGC) module, and multi-scale temporal self-attention convolution (MSTC) module. The LKGC module adaptively focuses on active motion regions by utilizing a large convolution kernel and a gating mechanism, effectively capturing long-distance dependencies within the skeleton sequence. Meanwhile, the MAGC module dynamically learns the relationships between different joints by adjusting the connection weights between nodes. To further enhance the model's ability to capture temporal dynamics, the MSTC module effectively aggregates the temporal information by integrating ECA with multi-scale convolution. In addition, we use a multi-stream fusion strategy to make full use of different modal skeleton data, including bone, joint, joint motion, and bone motion. Exhaustive experiments on three scale-varying datasets, i.e., NTU-60, NTU-120, and NW-UCLA, demonstrate that our MSLK-GCN can achieve state-of-the-art performance with fewer parameters.
-
-