Using DragPushing to Refine Concept Index for Text Categorization
-
Abstract
Concept index (CI) is a very fast and efficientfeature extraction (FE) algorithm for text classification. The keyapproach in CI scheme is to express each document as a function ofvarious concepts (centroids) present in the collection. However, therepresentative ability of centroids for categorizing corpus is ofteninfluenced by so-called model misfit caused by a number of factors inthe FE process including feature selection to similarity measure. Inorder to address this issue, this work employs the ``DragPushing''Strategy to refine the centroids that are used for concept index. Wepresent an extensive experimental evaluation of refined concept index(RCI) on two English collections and one Chinese corpus usingstate-of-the-art Support Vector Machine (SVM) classifier. The resultsindicate that in each case, RCI-based SVM yields a much betterperformance than the normal CI-based SVM but lower computation costduring training and classification phases.
-
-