We use cookies to improve your experience with our site.
LI Xiaoli, SHI Zhongzhi. Innovating Web Page Classification Through Reducing Noise[J]. Journal of Computer Science and Technology, 2002, 17(1).
Citation: LI Xiaoli, SHI Zhongzhi. Innovating Web Page Classification Through Reducing Noise[J]. Journal of Computer Science and Technology, 2002, 17(1).

Innovating Web Page Classification Through Reducing Noise

  • This paper presents a new method thateliminates noise in Web page classification. It first describes thepresentation of a Web page based on HTML tags. Then through a noveldistance formula, it eliminates the noise in similarity measure. Aftercarefully analyzing Web pages, we design an algorithm that candistinguish related hyperlinks from noisy ones. We can utilize non-noisyhyperlinks to improve the performance of Web page classification (the CAWNalgorithm). For any page, we can classify it through the text andcategory of neighbor pages related to the page. The experimental resultsshow that our approach improved classification accuracy.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return