基于形态滤波增强的极大稳定极值区方法的视频文本检测

诸葛云志; 卢湖川

doi:10.1007/s11390-015-1528-z

基于形态滤波增强的极大稳定极值区方法的视频文本检测

Robust Video Text Detection with Morphological Filtering Enhanced MSER

摘要

摘要: 通常情况下,由于视频的有损压缩或视频本身的质量不高,而且视频的背景都比较复杂,视频的字幕经常会出现颜色渗透、边缘模糊和对比度低的现象,这就给视频文本检测带来了挑战,在本文中提出了一个鲁棒的视频文本检测框架来解决这些问题。首先,我们使用梯度幅度图(GAM)来增强输入图像的文本边界,克服了文本边界模糊和颜色渗透的问题;其次,使用两个方向的形态滤波滤除部分背景干扰并增强了文本与背景的对比度;再次,使用最稳定极值(MSER)区域检测器来检测视频文本的连通区域,以MSER 检测到文本区域的亮度均值作为Graph Cuts 的标签集,HSI 颜色空间的H、S、I 三通道的欧式距离作为平滑项得到文本的最佳分割;最后利用文本的几何分布特性将文本连成文本行,并用多帧确认和一些经验规则去除非文本区域。为了验证本文算法的有效性,我们对一系列具有挑战的视频进行测试,实验证明本文提出的文本检测框架具有很好的鲁棒性。

Abstract: Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts' label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame veri cation, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.

HTML全文

参考文献()

施引文献

资源附件()