Visual Similarity Based Document Layout Analysis
-
Abstract
In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clusteringstrategy can adaptively deal with documents in different languages,with different layout structures and skew angles. Aiming at a robustand adaptive DLA approach, the authors first manage to find a set ofrepresentative filters and statistics to characterize typical texturepatterns in document images, which is through a visual similaritytesting process. Texture features are then extracted from thesefilters and passed into a dynamic clustering procedure, which is calledvisual similarity clustering. Finally, text contents are locatedfrom the clustered results. Benefit from this scheme, the algorithmdemonstrates strong robustness and adaptability in a wide variety ofdocuments, which previous traditional DLA approaches do not possess.
-
-