We use cookies to improve your experience with our site.

通过分层匹配实现小目标检测的知识蒸馏

Knowledge Distillation via Hierarchical Matching for Small Object Detection

  • 摘要:
    研究背景 近年来,知识蒸馏作为一种模型压缩技术在图像分类中取得了显著进展,但在目标检测领域尤其是小目标检测方面,该技术仍然存在改进空间。目前的挑战在于小目标的特征在卷积神经网络(CNN)中由于下采样过程而变得不明显,且容易受到背景噪声的干扰。这些问题导致在知识蒸馏过程中,小目标的特征难以被有效提取和精细化。针对这一问题,我们提出了一种新型的分层匹配知识蒸馏网络(HMKD),该网络特别设计用于在特征金字塔网络(FPN)的不同层次上更好地操作和提炼小目标的特征。
    目的 本研究的主要目的是通过创新的分层匹配知识蒸馏方法(HMKD),来提升小目标检测的精度和效率。传统的知识蒸馏方法在处理小目标时效果不佳,我们通过结合编码-解码机制和注意力机制,强化了小目标在FPN中浅层高分辨率特征的学习。这种方法不仅针对一阶段和两阶段的目标检测器都有显著的性能提升,还能通过对教师网络深层次的语义信息进行编码,更有效地将这些信息传递给学生网络,从而提高了学生网络对小目标的检测能力。
    方法 我们提出的HMKD方法包括多个关键步骤:首先,通过编码器从教师网络中提取深层的高语义信息,这些信息具体表现为低分辨率的查询向量。随后,这些查询向量与学生网络中小目标的高分辨率特征值进行匹配,此过程中引入注意力机制以确保高相关性的特征能够被有效匹配和蒸馏。此外,为了进一步提升蒸馏效果,我们设计了一个补充蒸馏模块,该模块特别针对背景与前景的关系进行学习,使学生网络能够更全面地理解场景信息。
    结果 在COCO2017数据集上的实验结果显示,我们的方法基于Faster R-CNN框架实现了41.7%的mAP,相较于传统方法提高了3.8%。此外,我们还在VisDrone数据集上进行了测试,结果显示学生模型的性能甚至超过了教师模型,这证明了我们的方法在实际应用中的有效性和通用性。通过这些实验,我们验证了HMKD在改善小目标检测性能方面的显著优势。
    结论 本文提出的分层匹配知识蒸馏方法(HMKD)有效地解决了小目标检测中的特征精细化问题,显著提高了目标检测模型在处理小尺度目标时的准确性和鲁棒性。未来的研究将探讨如何进一步优化该方法,以适应更广泛的应用场景,并减少训练过程中的时间成本,同时也将研究不同网络结构之间的知识转移问题,以实现更广泛的模型适应性。

     

    Abstract: Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification, but there still remains scope for improvement in object detection, especially for knowledge extraction of small objects. The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network (CNN), resulting in the insufficient refinement of small object features during distillation. In this paper, we propose Hierarchical Matching Knowledge Distillation Network (HMKD) that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network (FPN), aiming to intervene on small object features before affecting. We employ an encoder-decoder network to encapsulate low-resolution, highly semantic information, akin to eliciting insights from profound strata within a teacher network, and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key. During this period, we use an attention mechanism to measure the relevance of the inquiry to the feature values. Also in the process of decoding, knowledge is distilled to the student. In addition, we introduce a supplementary distillation module to mitigate the effects of background noise. Experiments show that our method achieves excellent improvements for both one-stage and two-stage object detectors. Specifically, applying the proposed method on Faster R-CNN achieves 41.7% mAP on COCO2017 (ResNet50 as the backbone), which is 3.8% higher than that of the baseline.

     

/

返回文章
返回