We use cookies to improve your experience with our site.

基于扩展的水流方法进行文本行分割的算法

Extended Approach to Water Flow Algorithm for Text Line Segmentation

  • 摘要: 1.该文创新点:提出一种新的基于水流的文本行分割方法。该方法相对于传统方法而言无需假设水流算法的角度参数。通过浸透区域以及非浸透区域的建立,可以进行文本行和非文本行的区分。
    2.实现方法:所提出的水流算法在原始水流算法基础上的改进。该文中所提出的方法是将所有的文本图像划分成区域较小但统一的文本对象单元。文本对象可通过提取具有外接矩形(bounding box)的连通区域(connected-components)获得。每个连通区域之间是相互分离。该文中所提出的水流算法中,水流角度可从更广阔的角度范围中进行选择。在水流角度选择合适的情况下,未浸透的区域变长,这样使得的文本行分割的性能更佳。
    3.结论及未来待解决的问题:本文提出了一种扩展的水流算法。本文所提出的水流算法无需要使用传统的方法中的角度参数,通过改变水流的空间滤波器模板而获得。此外,结合算法中外接矩形的功效及全方位的参数来延伸未浸透区域,其有效地降低了分本行的欠分割和过分割概率。
    4.实用价值或应用前景:在文本行分割中有一定的作用,可以应用于手写体的分割中。

     

    Abstract: This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing.

     

/

返回文章
返回