We use cookies to improve your experience with our site.
Xu-Bin Deng, Yang-Yong Zhu. L-tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises[J]. Journal of Computer Science and Technology, 2005, 20(6): 763-773.
Citation: Xu-Bin Deng, Yang-Yong Zhu. L-tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises[J]. Journal of Computer Science and Technology, 2005, 20(6): 763-773.

L-tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

  • In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support dataextraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return