›› 2015, Vol. 30 ›› Issue (4): 874-887.doi: 10.1007/s11390-015-1566-6

Special Issue: Artificial Intelligence and Pattern Recognition; Data Management and Data Mining

• Special Section on Data Management and Data Mining • Previous Articles     Next Articles

Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine

Dong-Hong Han1,2(韩东红), Member, CCF, Xin Zhang1(张昕), Guo-Ren Wang1,2(王国仁), Senior Member, CCF   

  1. 1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
    2. Key Laboratory of Medical Image Computing (NEU), Ministry of Education, Shenyang 110819, China
  • Received:2015-01-31 Revised:2015-05-15 Online:2015-07-05 Published:2015-07-05
  • About author:Dong-Hong Han received her M.S. and Ph.D. degrees in computer science and technology from Northeastern University, Shenyang, in 2002 and 2007, respectively. Currently, she is an associate professor in the College of Information Science and Engineering, Northeastern University, Shenyang. Her research interests include data stream management, data mining, and uncertain data management.
  • Supported by:

    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61173029 and 61272182.

Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.

[1] Babcock B, Babu S, Datar M et al. Models and issues in data stream systems. In Proc. the 21st ACM SIGMODSIGACT- SIGART Symposium on Principles of Database Systems, June 2002, pp.1-16.

[2] Tran T T, Peng L, Li B et al. PODS: A new model and processing algorithms for uncertain data streams. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.159-170.

[3] Cao K Y, Wang G R, Han D H et al. Continuous outlier monitoring on uncertain data streams. Journal of Computer Science and Technology, 2014, 29(3): 436-448.

[4] Zhao L, Yang Y Y, Zhou X. Continuous probabilistic subspace skyline query processing using grid projections. Journal of Computer Science and Technology, 2014, 29(2): 332- 344.

[5] Zhou A Y, Jin C Q, Wang G R et al. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1-16. (in Chinese)

[6] He Q, Shang T, Zhuang F et al. Parallel extreme learning machine for regression based on MapReduce. Neurocomputing, 2013, 102: 52-58.

[7] Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609-623.

[8] Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.

[9] Xu W, Qin Z, Chang Y. A framework for classifying uncertain and evolving data streams. Information Technology Journal, 2011, 10(10): 1926-1933.

[10] Domingos P, Hulten G. Mining high-speed data streams. In Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2000, pp.71-80.

[11] Hulten G, Spencer L, Domingos P. Mining time-changing data streams. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001, pp.97-106.

[12] Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2003, pp.523-528.

[13] Liu J, Li X, Zhong W. Ambiguous decision trees for mining concept-drifting data streams. Pattern Recognition Letters, 2009, 30(15): 1347-1355.

[14] Gama J, Kosina P. Learning decision rules from data streams. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1255-1260.

[15] Kosina P, Gama J. Handling time changing data with adaptive very fast decision rules. In Machine Learning and Knowledge Discovery in Databases, Flach P, Bie T, Cristianini N (eds.), Springer, 2012, pp.827-842.

[16] Frias-Blanco I, del Campo-Avila J, Ramos Jimenez G et al. Online and nonparametric drift detection methods based on Hoeffding's bounds. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823.

[17] Street W N, Kim Y. A streaming ensemble algorithm (SEA) for large-scale classification. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001, pp.377-382.

[18] Stanley K O. Learning concept drift with a committee of decision trees. Technical Report, UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA, 2003.

[19] Wang H, Fan W, Yu P S et al. Mining concept-drifting data streams using ensemble classifiers. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003, pp.226-235.

[20] Nishida K, Yamauchi K, Omori T. ACE: Adaptive classifiers-ensemble system for concept-drifting environments. In Proc. the 6th Int. Workshop on Multiple Classifier Systems, June 2005, pp.176-185.

[21] Li P, Wu X, Hu X et al. A random decision tree ensemble for mining concept drifts from noisy data streams. Applied Artificial Intelligence, 2010, 24(7): 680-710.

[22] Ye Y,Wu Q, Huang J Z et al. Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 2013, 46(3): 769-787.

[23] Liang C, Zhang Y, Song Q. Decision tree for dynamic and uncertain data streams. In Proc. the 2nd Asian Conference on Machine Learning, November 2010, pp.209-224.

[24] Qin B, Xia Y, Li F. DTU: A decision tree for uncertain data. In Proc. the 13th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining, April 2009, pp.4-15.

[25] Pan S, Wu K, Zhang Y et al. Classifier ensemble for uncertain data stream classification. In Proc. the 14th Pacific- Asia Conf. Advances in Knowledge Discovery and Data Mining, June 2010, pp.488-495.

[26] Jenhani I, Amor N B, Elouedi Z. Decision trees as possibilistic classifiers. International Journal of Approximate Reasoning, 2008, 48(3): 784-807.

[27] Liu B, Xiao Y, Cao L et al. One-class-based uncertain data stream learning. In Proc. the 11th SIAM International Conference on Data Mining, April 2011, pp.992-1003.

[28] Cao K, Wang G, Han D et al. Classification of uncertain data streams based on extreme learning machine. Cognitive Computation, 2015, 7(1): 150-160.

[29] Huang G B, Wang D H, Lan Y. Extreme learning machines: A survey. International Journal of Machine Learning and Cybernetics, 2011, 2(2): 107-122.

[30] Huang G B, Babri H A. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Transactions on Neural Networks, 1998, 9(1): 224-229.

[31] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: Theory and applications. Neurocomputing, 2006, 70(1/2/3): 489-501.

[32] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[2] Li Renwei;. Soundness and Completeness of Kung s Reasoning Procedure[J]. , 1988, 3(1): 7 -15 .
[3] Zhu Mingyuan;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[4] Xu Zhiming;. Discrete Interpolation Surface[J]. , 1990, 5(4): 329 -332 .
[5] Klaus Buchenrieder;. Standard-Cell Placement from Functional Descriptions[J]. , 1991, 6(1): 37 -46 .
[6] Guo Hengchang;. On the Characterization and Fault Identification of Sequentially t-Diagnosable System Under PMC Model[J]. , 1991, 6(1): 83 -90 .
[7] Harald E. Otto;. UNDO, An Aid for Explorative Learning?[J]. , 1992, 7(3): 226 -236 .
[8] Li Renwei; He Pei; Zhang Wenhui;. An Introduction to IN CAPS System[J]. , 1993, 8(1): 26 -37 .
[9] Adelino Santos;. Cooperative Hypermedia Editing with CoMEdiA[J]. , 1993, 8(3): 67 -79 .
[10] Gu Junzhong;. Modelling Enterprises with Object-Oriented Paradigm[J]. , 1993, 8(3): 80 -89 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved