基于类别知识引导增强特征表示的低分辨率细粒度图像识别
Enhanced Feature Representations for Low-Resolution Fine-Grained Image Recognition via Categorical Knowledge Guidance
-
摘要:研究背景 深度学习技术的快速发展极大地提高了许多视觉任务的智能化水平。在细粒度图像识别领域,现有方法主要基于一个理想的前提条件,即训练样本和测试样本都是高分辨率的。然而,在真实的军事侦察、智能安防等场景中,受到监测距离和设备配置的限制,高分辨率图像数据很难获取,采集的图像数据存在分辨率低、细粒度细节信息缺乏等问题,导致细粒度识别精度急剧下降。因此,低分辨率细粒度图像识别技术迫切需要解决上述问题以获取准确的预测结果,而鲁棒的特征表示对于准确预测低分辨率图像的细粒度类别至关重要。目的 现有工作没有充分利用细粒度类别相关知识对低分辨率数据的细粒度特征重构和提取进行引导和约束,制约了模型学习低分辨率数据全局和局部细粒度特征的能力。本文旨在提出一种类别知识引导强化特征表达网络来捕捉低分辨率数据中精细且可靠的细粒度特征描述。方法 为了实现上述目的,本文提出一个类别知识引导强化特征表达网络,设计类级别蒸馏策略和部件查询机制,学习对低分辨率数据精细且可靠的细粒度特征表示。首先,针对低分辨率细粒度数据中细节信息有限的挑战,所提方法通过使用一组可学习的存储库实现类级别的知识蒸馏,将整个数据集上特定类别的高分辨率样本的高质量特征迁移到相同类别的低分辨率样本的特征描述中,引导网络更准确地修复低分辨率图像的细节信息并提取鲁棒的特征表达,使得网络从低分辨率图像中学习到的全局特征与对应类别的高分辨率数据的特征表示相近。其次,考虑到细粒度判别性特征通常分布在目标的多个局部区域中,本文预先设置一组部件查询向量用于学习细粒度类间判别性线索的位置信息,并利用其解码出多样且具有判别力的局部特征。最终,将图像的全局表示与局部判别特征相结合,形成对低分辨率目标更全面和更有意义的细粒度特征描述,从而提高识别性能。结果 本文所提方法在三个合成的低分辨率细粒度数据集(Car-S, CUB-S, Aircraft-S)和一个真实的低分辨率细粒度数据集(RP-281)上分别取得了90.42%,75.18%,88.47%和92.93%的准确率,显著优于现有方法的结果。同时本文方法也在四个数据集上进行了大量的消融实验,用于验证所提方法中各个模块的有效性。结论 实验结果表明,通过探索和利用细粒度类间知识能够促使网络从低分辨细粒度图像中学习到鲁棒的特征描述,进而提升了最终的识别性能。在四个数据集上的消融实验也验证了所提方法中各个模块的有效性。本文方法主要关注图像分辨率低给细粒度识别带来的问题,在未来研究中还需更多地关注复杂条件下低质量图像细粒度识别问题。Abstract: Low-resolution (LR) fine-grained image recognition requires the ability to recognize the subcategories of LR samples with limited fine-grained details. The existing methods do not make full use of the guiding and constraining capabilities of category-related knowledge to recover and extract the fine-grained features of LR data; thus these methods have a limited ability to learn the global and local fine-grained features of LR data. In this paper, we propose an enhanced feature representation network (EFR-Net) based on categorical knowledge guidance to capture delicate and reliable fine-grained feature descriptions of LR data and improve the recognition accuracy. First, to overcome the challenges posed by the limited fine-grained details in LR data, we design a classwise distillation loss. This loss function transfers the high-quality features of class-specific high-resolution (HR) samples into the feature learning of the same-category LR samples by using a memory bank. In this way, the global representation of LR images is closer to the meaningful and high-quality image features. Second, considering that fine-grained discriminative features are often hidden in object parts, we present a group of part queries to learn the positional information where the discriminative cues exist across all categories, and we then use the queries to decode diverse and discriminative part features. The global representation, in combination with the local discriminative features, creates more comprehensive and meaningful feature descriptions of the LR fine-grained objects, thus improving the recognition performance. Extensive comparison experiments on four LR datasets demonstrate the effectiveness of EFR-Net.
下载: