We use cookies to improve your experience with our site.
高广宇, 马华东. 基于全景帧和典型局部特征的影视场景识别[J]. 计算机科学技术学报, 2014, 29(1): 155-164. DOI: 10.1007/s11390-013-1418-1
引用本文: 高广宇, 马华东. 基于全景帧和典型局部特征的影视场景识别[J]. 计算机科学技术学报, 2014, 29(1): 155-164. DOI: 10.1007/s11390-013-1418-1
Guang-Yu Gao, Hua-Dong Ma. Movie Scene Recognition Using Panoramic Frame and Representative Feature Patches[J]. Journal of Computer Science and Technology, 2014, 29(1): 155-164. DOI: 10.1007/s11390-013-1418-1
Citation: Guang-Yu Gao, Hua-Dong Ma. Movie Scene Recognition Using Panoramic Frame and Representative Feature Patches[J]. Journal of Computer Science and Technology, 2014, 29(1): 155-164. DOI: 10.1007/s11390-013-1418-1


Movie Scene Recognition Using Panoramic Frame and Representative Feature Patches

  • 摘要: 近年来,在计算机视觉领域,许多研究者和研究兴趣点都集中于图像和视频的语义内容识别,如对象检测和地点识别(如“我在哪里?”)。本文研究的场景识别问题,正是对一个场景图片或视频分配一个表明其地点信息的语义标签(如,卧室,街道等)。目前,大量的场景识别方法都是基于图片的识别,而且当这些方法被运用到视频场景识别中时,其结果通常不尽如人意。因此,针对视频场景识别的诸多难点,本文通过构建并使用全景帧和典型特征块,结合词袋概念和潜在狄利克雷分析等概念和方法,提出了一种高效准确的电影场景识别方法。具体而言,首先,通过预处理,我们将输入的电影视频分割成镜头和视频场景片段。其次,我们提出了一种新的使用全景帧的镜头关键帧提取算法,并且在全景帧基础上,通过一系列局部特征提取和聚类算法,形成了表征每个视频场景片段的典型局部特征块。最后,利用这些典型局部特征块,构建基于狄利克雷分析的层次化主题模型,用于识别每个单独的视频片段的所属场景类别。同时,考虑到视频场景片段的重复出现和前后关联,我们还构建场景片段之间的相关度来增强单个场景片段的识别准确率。我们在大量复杂和典型的电影场景中测试我们提出的方法的性能,并取得了较为满意的结果。实验结果表明,同目前的方法相比,我们的方法能较为有效地识别大部分的电影场景片段。


    Abstract: Recognizing scene information in images or videos, such as locating the objects and answering "Where am I?", has attracted much attention in computer vision research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.


