• Articles • Previous Articles    

Text Classification Using Sentential Frequent Itemsets

Shi-Zhu Liu and He-Ping Hu   

  1. College of Computer Science, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2005-05-22 Revised:2006-09-05 Online:2007-03-10 Published:2007-03-10

Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.

[1] Li Wenmin, Jiawei Han, Pei Jian. CMAR: Accurate and efficient classification based on multiple class-association rules. In -\it Proc. IEEE Int. Conf. Data Mining}, % $($ICDM'01$)$}, Nick Cercone, T Y Lin, Xingdong Wu (eds.), San Jose, CA, USA, 2001, pp.369--376.

[2] Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. In -\it Proc. ACM Int. Conf. Knowledge Discovery and Data Mining $($SIGKDD'98$)$}, New York City, USA, August 1998, pp.80--86.

[3] Antonie Maria-Luiza, Zaiane Osmar R. Text document categorization by term association. In -\it Proc. IEEE Int. Conf. Data Mining $($ICDM'2002$)$}, Maebashi City, Japan, %Dec. 9--12, 2002, pp.19--26.

[4] Meretakis D, Fragoutids D, Lu H \it et al. \rm Scalable association-based text classification. In -\it Proc. the 9th Int. Conf. Information and Knowledge Management}, Arvin Agah, Jamie Callan, Elke Rundensteiner -\it et al.} (eds.), McLean, USA, 2000, pp.5--11.

[5] Hull D A. Improving text retrieval for the routing problem using latent semantic indexing. In -\it Proc. the 17th Annual Int. ACM-SIGIR Conf. Research and Development in Information Retrieval}, W Bruce Croft, C J van Rijsbergen (eds.), Dublin, Ireland, 1994, pp.282--291.

[6] Lewis D D. Na\"\i ve (Bayes) at forty: The independence assumption in information retrieval. In -\it Proc. the 10th European Conf. Machine Learning}, Claire N\'edellec, C\'eline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.4--15.

[7] Joachims T. Text categorization with support vector machines: Learning with many relevant features. In -\it Proc. 10th European Conf. Machine Learning}, Claire N\'edellec, C\'eline Rouveirol (eds.), Chemnitz, Germany, 1998, pp.137--142.

[8] Cohen W, Hirsch H. Joins that generalize: Text classification using whirl. In -\it Proc. 4th Int. Conf. Knowledge Discovery and Data Mining $($SigKDD'98$)$}, New York City, USA, 1998, pp.169--173.

[9] Cohen W, Singer Y. Context-sensitive learning methods for text categorization. -\it ACM Trans. Information Systems,} 1999, 17(2): 146--173.

[10] Yang Y. An evaluation of statistical approaches to text categorization. Technical Report CUM-CS-97-127, Carnegie Mellon University, April 1997.

[11] Mounlinier I, Ganascia J G. Applying an existing machine learning algorithm to text categorization. In -\it Connectionist Statistical, and Symbolic Approaches to Learning for Natural Language Processing}, Wermter S, Riloff E, Scheler G (eds.), Heidelberg, Germany: Springer Verlag, -\it Lecture Notes in Computer Science}, Vol. 1040, 1996, pp.343--354.

[12] Li H, Yamanishi K. Text classification using esc-based stochastic decision lists. In -\it Proc. 8th ACM Int. Conf. Information and Knowledge Management $($CIKM-99$)$}, Kansas City, USA, 1999, pp.122--130.

[13] Apte C, Damerau F, Weiss S. Automated Learning of Decision Rules for Text Categorization. -\it ACM Trans. Information System}, 1994, 12(3): 232--251.

[14] Tan C M, Wang Y F, Lee C D. The use of bigrams to enhance text categorization. \it Journal of Information Processing and Management, \rm July 2002, 38(4): 529--546.

[15] Ruiz M, Sinivasan P. Neural networks for text categorization. In -\it Proc. 22nd ACM SIGIR Int. Conf. Information Retrieval}, Berkeley, CA, USA, August 1999, pp.281--282.

[16] Yang Y, Liu X. A re-examination of text categorization methods. In -\it Proc. 22nd ACM Int. Conf. Research and Development in Information Retrieval $($SIGIR-99$)$}, Berkeley, USA, 1999, pp.42--49.

[17] Ziarko W. Variable precision rough set model. -\it J. Computer and System Sciences}, 1993, 46(1): 39--59.

[18] Salton G, Wong A, Yang C. A vector space model for automatic indexing. -\it Comn. ACM}, Nov. 1975, 18(11): 613--620.

[19] Salton G. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Reading, Mas: Addison Wesley, 1989.

[20] Za\'\i ane O R, Antonie M L. Classifying text documents by association terms with text categories. In -\it Proc. 13th Australasian Database Conference $($ACD'02$)$}, Melbourne, Australia, January 2002, pp.215--222.
No related articles found!
Full text



[1] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[2] Zhang Bo; Zhang Ling;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[3] Meng Liming; Xu Xiaofei; Chang Huiyou; Chen Guangxi; Hu Mingzeng; Li Sheng;. A Tree-Structured Database Machine for Large Relational Database Systems[J]. , 1987, 2(4): 265 -275 .
[4] Lin Qi; Xia Peisu;. The Design and Implementation of a Very Fast Experimental Pipelining Computer[J]. , 1988, 3(1): 1 -6 .
[5] Li Renwei;. Soundness and Completeness of Kung s Reasoning Procedure[J]. , 1988, 3(1): 7 -15 .
[6] Sun Chengzheng; Tzu Yungui;. A New Method for Describing the AND-OR-Parallel Execution of Logic Programs[J]. , 1988, 3(2): 102 -112 .
[7] Zhang Bo; Zhang Tian; Zhang Jianwei; Zhang Ling;. Motion Planning for Robots with Topological Dimension Reduction Method[J]. , 1990, 5(1): 1 -16 .
[8] Zhou Chaochen; Liu Xinxin;. Denote CSP with Temporal Formulas[J]. , 1990, 5(1): 17 -23 .
[9] Wang Dingxing; Zheng Weimin; Du Xiaoli; Guo Yike;. On the Execution Mechanisms of Parallel Graph Reduction[J]. , 1990, 5(4): 333 -346 .
[10] Zhuang Nan;. Design of Quaternary ECL Q Gate[J]. , 1991, 6(1): 32 -36 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved