We use cookies to improve your experience with our site.
YANG Jianwu, CHEN Xiaoou. A Semi-Structured Document Model for Text Mining[J]. Journal of Computer Science and Technology, 2002, 17(5).
Citation: YANG Jianwu, CHEN Xiaoou. A Semi-Structured Document Model for Text Mining[J]. Journal of Computer Science and Technology, 2002, 17(5).

A Semi-Structured Document Model for Text Mining

  • A semi-structured document has more structuredinformation compared to an ordinary document, and the relationamong semi-structured documents can be fully utilized. In order to takeadvantage of the structure and link information in a semi-structureddocument for better mining, a structured link vector model (SLVM) ispresented in this paper, where a vector represents a document, andvectors' elements are determined by terms, document structure andneighboring documents. Text mining based on SLVM is described in theprocedure of K-means for briefness and clarity: calculating documentsimilarity and calculating cluster center. The clustering based on SLVMperforms significantly better than that based on a conventional vectorspace model in the experiments, and its F value increases from0.65--0.73 to 0.82--0.86.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return