Pushing to the Limit: An Attention-Based Dual-Prune Approach for Highly-Compacted CNN Filter Pruning

Yu-Chu Fang; Wen-Zhong Li; Yao Zeng; Qing-Ning Lu; Sang-Lu Lu

doi:10.1007/s11390-024-3536-3

Fang YC, Li WZ, Zeng Y et al. Pushing to the limit: An attention-based dual-prune approach for highly-compacted CNN filter pruning. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY, 40(3): 805−820, May 2025. DOI: 10.1007/s11390-024-3536-3

Citation:

Pushing to the Limit: An Attention-Based Dual-Prune Approach for Highly-Compacted CNN Filter Pruning

Abstract

Abstract

Filter pruning is an important technique to compress convolutional neural networks (CNNs) to acquire light-weight high-performance model for practical deployment. However, the existing filter pruning methods suffer from sharp performance drops when the pruning ratio is large, probably due to the unrecoverable information loss caused by aggressive pruning. In this paper, we propose a dual attention based pruning approach called DualPrune to push the limit of network pruning at an ultra-high compression ratio. Firstly, it adopts a graph attention network (GAT) to automatically extract filter-level and layer-level features from CNNs based on the roles of their filters in the whole computation graph. Then the extracted comprehensive features are fed to a side-attention network, which generates sparse attention weights for individual filters to guide model pruning. To avoid layer collapse, the side-attention network adopts a side-path design to preserve the information flow going through the CNN model properly, which allows the CNN model to be pruned at a high compression ratio at initialization and trained from scratch afterward. Extensive experiments based on several well-known CNN models and real-world datasets show that the proposed DualPrune method outperforms the state-of-the-art methods with significant performance improvement, particularly for model compression at a high pruning ratio.

FullText(HTML)

References (43)

Relative Articles

Supplements (3)

Cited By

Pushing to the Limit: An Attention-Based Dual-Prune Approach for Highly-Compacted CNN Filter Pruning

Abstract

Catalog

Export File

Citation

Format

Content