Special Issue: Data Management and Data Mining

• Articles • Previous Articles     Next Articles

A semi-random multiple decision-tree algorithm for mining data streams

Xue-Gang Hu{1, Pei-Pei Li{1, Xin-Dong Wu{1,2, and Gong-Qing Wu1   

  1. {1}School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China {2}Department of Computer Science, University of Vermont, Burlington, VT 50405, U.S.A.
  • Received:2007-01-15 Revised:2007-07-16 Online:2007-09-10 Published:2007-09-10

Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naïve Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.

Key words: multiagent; RAO logic; position exchange principle(PFP);

[1] Pedro Domingos, Geoff Hulten. Mining high-speed data streams. In -\it Proc. Knowledge Discovery and Data Mining}, Boston, MA, USA, 2000, pp.71$\sim$80.

[2] Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams. In -\it Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, Washington DC, USA, 2003, pp.523$\sim$528.

[3] Qiang Ding, Qin Ding, Perrizo W. Decision tree classification of spatial data streams using Peano count trees. In -\it Proc. ACM Symposium on Applied Computing $($SAC'02$)$}, Madrid, Spain, March 2002, pp.413$\sim$417.

[4] Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In -\it Proc. IEEE FOCS, 41st Annual Symposium on Foundations of Computer Science}, Redondo Beach, CA, USA, 2000, pp.359$\sim$366.

[5] Moses Charikar, Liadan O'Callaghan, Rina Panigrahy. Better streaming algorithms for clustering problems. In -\it Proc. the Thirty-Fifth Annual ACM Symposium on Theory of Computing}, San Diego, CA, USA, 2003, pp.30$\sim$39.

[6] L O'Cllaghan, Nina Mishra, Adam Meyerson. Streaming-data algorithms for high-quality clustering. In -\it Proc. ICDE'02}, San Jose, CA, USA, 2002, pp.685$\sim$694.

[7] Ordonez C. Clustering binary data streams with K-means. In -\it Proc. the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery}, San Diego, CA, USA, 2003, pp.12$\sim$19.

[8] Aggarwal C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In -\it Proc. the 29th VLDB Conference}, Berlin, Germany, 2003, pp.81$\sim$92.

[9] Xiaoyun Zhou, Zhihui Su, Baili Zhang, Yidong Yang. An efficient discovering and maintenance algorithm of subspace clustering over high dimensional data streams. \it Journal of Computer Research and Development, \rm 2006, 43(5): 834$\sim$840.

[10] Xuli Jun, Xiekang Lin, Xu Hong. Discovering frequent itemsets over data streams. -\it Journal of Shanghai Jiaotong University}, 2006, 40(3): 502$\sim$506.

[11] Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In -\it Proc. KDD-2003}, 2003, Washington DC, USA, pp.487$\sim$492.

[12] Lijun Xu, Kanglin Xie, Hong Xu. Discovering frequent itemsets over data streams. -\it Journal of Shanghai Jiaotong University}, Washington DC, USA, 2006, 40(3): 502$\sim$506.

[13] Guojun Mao, Xindong Wu, Xingquan Zhu, Gong Chen, Chunnian Liu. Mining maximal frequent itemsets from data streams. -\it Journal of Information Science}, 2007, 33(3): 251$\sim$262.

[14] Gong Chen, Xindong Wu, Xingquan Zhu. Sequential pattern mining in multiple streams. In -\it Proc. ICDM'05}, Houston, TX, USA, 2005, pp.585$\sim$588.

[15] Utgoff P E, Berkman N C, Clouse J A. Decision tree induction based on efficient tree restructuring. -\it Machine Learning}, 1997, 29(1): 5$\sim$44.

[16] Kalles D, Morris T. Efficient incremental induction of decision trees. -\it Machine Learning}, 1996, 24(3): 231$\sim$242.

[17] Fan W, Wang H, Yu P S, Ma S. Is random model better On its accuracy and efficiency. In -\it Proc. ICDM'03}, Melbourne, FL, USA, 2003, pp.51$\sim$58.

[18] Hoeffding W. Probability inequalities for sums of bounded random variables. -\it Journal of the American Statistical Association}, 1963, 58: 13$\sim$30.

[19] Maron O, Moore A. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation. Advances in Neural Information Processing Systems, Cowan J D, Tesauro G, Alspector J (eds.), San Mato: Morgan Kaufmann, CA, 1994.

[20] Amit Y, Geman D. Shape quantization and recognition with randomized trees. -\it Neural Computation}, 1997, 9(7): 1545$\sim$1588.

[21] Tin Kam Ho. The random subspace method for constructing decision forests. -\it IEEE Trans. Pattern Analysis and Machine Intelligence}, Aug. 1998, 20(8): 832$\sim$844.

[22] Dietterich T G. Ensemble Methods in Machine Learning. First International Workshop on Multiple Classifier Systems, Kittler J, Roli F (eds.), New York: Springer Verlag, 2000, pp.1$\sim$15.

[23] Dietterich T G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. -\it Machine Learning}, 2000, 40(2): 139$\sim$157.

[24] Leo Breiman. Random forests. -\it Machine Learning}, 2001, 45(1): 5$\sim$32.

[25] Liu T F, Ting K M, Fan W. Maximizing tree diversity by building complete-random decision trees. In -\it Proc. the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining}, Hanoi, Vietnam, May 2005, pp.605$\sim$610.

[26] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

[27] Blake C, Keogh E, Merz C. UCI repository of machine learning databases, 1998, http://www.ics.uci.edu /$\sim$mlearn/MLRepository.html.
[1] Yan Zheng, Jian-Ye Hao, Zong-Zhang Zhang, Zhao-Peng Meng, Xiao-Tian Hao. Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments [J]. Journal of Computer Science and Technology, 2020, 35(2): 268-280.
[2] SHI Zhongzhi; TIAN Qijia; LI Yunfeng;. RAO Logic for Multiagent Framework [J]. , 1999, 14(4): 393-400.
[3] wang Xuejun; Shi Chunyi;. A Multiagent Dynamic interaction Testbed:Theoretic Framework, System Architecture and Experimentation [J]. , 1997, 12(2): 121-132.
Full text



[1] Hao Ruibing; Wu Jianping;. A Formal Approach to Protocol Interoperability Testing[J]. , 1998, 13(1): 79 -90 .
[2] Chen Gang;. Dependent Type System with Subtyping (I)Type Level Transitivity Elimination[J]. , 1998, 13(6): 564 -578 .
[3] SUN Ninghui;. Reference Implementation of Scalable I/O Low-Level API on Intel Paragon[J]. , 1999, 14(3): 206 -223 .
[4] Ying-Lei Song, Ji-Zhen Zhao, Chun-Mei Liu, Kan Liu, Russell Malmberg, and Li-Ming Cai. RNA Structural Homology Search with a Succinct Stochastic Grammar Model[J]. , 2005, 20(4): 454 -464 .
[5] Shan Wang, Xiao-Yong Du, Xiao-Feng Meng, and Hong Chen. Database Research: Achievements and Challenges[J]. , 2006, 21(5): 823 -837 .
[6] Qing Ai, Yan-Song Li, and Gui-Lu Long. Influences of Gate Operation Errors in the Quantum Counting Algorithm[J]. , 2006, 21(6): 927 -931 .
[7] Ji-Dong Chen and Xiao-Feng Meng. Indexing Future Trajectories of Moving Objects in a Constrained Network[J]. , 2007, 22(2): 245 -251 .
[8] Minghui Jiang and Joel Gillespie. Engineering the Divide-and-Conquer Closest Pair Algorithm[J]. , 2007, 22(4): 532 -540 .
[9] Xiao-Min Hu, Jun Zhang, and Yun Li. Orthogonal Methods Based Ant Colony Search for Solving Continuous Optimization Problems[J]. , 2008, 23(1): 2 -0 .
[10] Yueh-Yi Lai and Wen-Kai Tai. Transition Texture Synthesis[J]. , 2008, 23(2): 280 -289 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved