Higher-Order Smoothing：A Novel Semantic Smoothing Method for Text Classification

Mitat Poyraz; Zeynep Hilal Kilimci; Murat Can Ganiz

doi:10.1007/s11390-014-1437-6

Mitat Poyraz, Zeynep Hilal Kilimci, Murat Can Ganiz. Higher-Order Smoothing：A Novel Semantic Smoothing Method for Text ClassificationJ. Journal of Computer Science and Technology, 2014, 29(3): 376-391. DOI: 10.1007/s11390-014-1437-6

Citation:

Higher-Order Smoothing：A Novel Semantic Smoothing Method for Text Classification

Abstract

Abstract

It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These finding have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move not only beyond instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS on several benchmark datasets.

FullText(HTML)

References (38)

Relative Articles

Supplements (0)

Cited By

Higher-Order Smoothing：A Novel Semantic Smoothing Method for Text Classification

Abstract

Catalog

Export File

Citation

Format

Content