Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (2): 295-304.doi: 10.1007/s11390-020-9999-y

• Special Section on Learning and Mining in Dynamic Environments • Previous Articles     Next Articles

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

Yi-Min Wen1,2,3,*, Senior Member, CCF, Shuai Liu3, Student Member, CCF        

  1. 1 Guangxi Key Laboratory of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology Guilin 541004, China;
    2 Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China;
    3 School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
  • Received:2019-08-28 Revised:2020-01-20 Online:2020-03-05 Published:2020-03-18
  • Contact: Yi-Min Wen
  • About author:Yi-Min Wen received his M.S. degree in general mechanics and fundamentals of mechanics from Beihang University, Beijing, in 2000, and his Ph.D. degree in computer software and theory from Shanghai Jiao Tong University, Shanghai, in 2007. He is current a professor in the School of Computer Science and Information Security at Guilin University of Electronic Technology, Guilin. His research interests include machine learning, data mining, and computer vision.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61866007, the Natural Science Foundation of Guangxi Zhuang Autonomous Region of China under Grant No. 2018GXNSFDA138006, and Humanities and Social Sciences Research Projects of the Ministry of Education of China under Grant No. 17JDGC022.

Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts. However, the generalization ability for each specific concept cannot be steadily improved, and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts. This paper proposes to solve these problems by BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) ensemble and local structure mapping. The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection. If a recurrent concept is detected, a historical BIRCH ensemble classifier is selected to be incrementally updated; otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool. The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.

Key words: semi-supervised classification; clustering; data stream; concept drift;

[1] Liu Q, Ma H P, Chen E H, Xiong H. A survey of contextaware mobile recommendations. International Journal of Information Technology & Decision Making, 2013, 12(1):139-172.
[2] Li Y, Si J, Zhou G J, Chen S C. FREL:A stable feature selection algorithm. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(7):1388-1402.
[3] Peng Y, Lu B L. Discriminative extreme learning machine with supervised sparsity preserving for image classification. Neurocomputing, 2017, 261:242-252.
[4] Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowledge and Information Systems, 2017, 53(3):551-577.
[5] Li Y F, Liang D M. Safe semi-supervised learning:A brief introduction. Frontiers of Computer Science, 2019, 13(4):669-676.
[6] Noorbehbahani F, Fanian A, Mousavi S R, Hasannejad H. An incremental intrusion detection system using a new semi-supervised stream classification method. International Journal of Communication Systems, 2017, 30(4):1-26.
[7] Sedhai S, Sun A. Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 2017, 5(1):169-175.
[8] Haque A, Khan L, Baron M. SAND:Semi-supervised adaptive novel class detection and classification over data stream. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.1652-1658.
[9] Haque A, Khan L, Baron M, Thuraisingham B M, Aggarwal C C. Efficient handling of concept drift and concept evolution over stream data. In Proc. the 32nd International Conference on Data Engineering, May 2016, pp.481-492.
[10] Wang Y, Li T. Improving semi-supervised co-forest algorithm in evolving data streams. Applied Intelligence, 2018, 48(10):3248-3262.
[11] Hosseini M J, Gholipour A, Beigy H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 2016, 46(3):567-597.
[12] Wu X D, Li P P, Hu X G. Learning from concept drifting data streams with unlabeled data. Neurocomputing, 2012, 92:145-155.
[13] Li P P, Wu X D, Hu X G. Mining recurring concept drifts with limited labeled streaming data. ACM Transactions on Intelligent Systems and Technology, 2012, 3(2):Article No. 32.
[14] Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams:Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.
[15] Masud M M, Woolam C, Gao J et al. Facing the reality of data stream classification:Coping with scarcity of labeled data. Knowledge and Information Systems, 2012, 33(1):213-244.
[16] Xu W H, Qin Z, Chang Y. Semi-supervised learning based ensemble classifier for stream data. Pattern Recognition and Artificial Intelligence, 2012, 25(2):292-299. (in Chinese)
[17] Zhang P, Zhu X Q, Tan J L, Guo L. Classifier and cluster ensembles for mining concept drifting data streams. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.1175-1180.
[18] Zhang T, Ramakrishnan R, Livny M. BIRCH:A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1997, 1(2):141-182.
[19] Gao J, Fan W, Jiang J, Han J. Knowledge transfer via multiple model local structure mapping. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2008, pp.283-291.
[20] Li Y C, Wang Y L, Liu Q et al. Incremental semi-supervised learning on streaming data. Pattern Recognition, 2019, 88:383-396.
[21] Zhou Z H. When semi-supervised learning meets ensemble learning. Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1):6-16.
[22] Zhang M L, Zhou Z H. Classifier ensemble with unlabeled data. arXiv:0909.3593, 2009.,August 2010.
[23] Zhang M L, Zhou Z H. Exploiting unlabeled data to enhance ensemble diversity. Data Mining and Knowledge Discovery, 2013, 26(1):98-129.
[24] Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA:Massive online analysis. Journal of Machine Learning Research, 2010, 11:1601-1604.
[1] Zhi-Xin Qi, Hong-Zhi Wang, An-Jie Wang. Impacts of Dirty Data on Classification and Clustering Models: An Experimental Evaluation [J]. Journal of Computer Science and Technology, 2021, 36(4): 806-821.
[2] Shi-Ying Sheng, Sheng-Tao Chen, Xiao-Ju Dong, Chun-Yuan Wu, Xiao-Ru Yuan. Inverse Markov Process Based Constrained Dynamic Graph Layout [J]. Journal of Computer Science and Technology, 2021, 36(3): 707-718.
[3] Li Wang, Hao Zhang, Hao-Wu Chang, Qing-Ming Qin, Bo-Rui Zhang, Xue-Qing Li, Tian-Heng Zhao, Tian-Yue Zhang. GAEBic: A Novel Biclustering Analysis Method for miRNA-Targeted Gene Data Based on Graph Autoencoder [J]. Journal of Computer Science and Technology, 2021, 36(2): 299-309.
[4] Yong-Hao Wu, Zheng Li, Yong Liu, Xiang Chen. FATOC: Bug Isolation Based Multi-Fault Localization by Using OPTICS Clustering [J]. Journal of Computer Science and Technology, 2020, 35(5): 979-998.
[5] Punit Kumar, Atul Gupta. Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey [J]. Journal of Computer Science and Technology, 2020, 35(4): 913-945.
[6] Xin Xu, Jiaheng Lu, Wei Wang. Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification [J]. , 2018, 33(4): 807-822.
[7] Yan-Xia Xu, Wei Chen, Jia-Jie Xu, Zhi-Xu Li, Guan-Feng Liu, Lei Zhao. Discovering Functional Organized Point of Interest Groups for Spatial Keyword Recommendation [J]. Journal of Computer Science and Technology, 2018, 33(4): 697-710.
[8] Chao Ni, Wang-Shu Liu, Xiang Chen, Qing Gu, Dao-Xu Chen, Qi-Guo Huang. A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction [J]. , 2017, 32(6): 1090-1107.
[9] Mohamed Maher Ben Ismail, Ouiem Bchir. Automatic Fall Detection Using Membership Based Histogram Descriptors [J]. , 2017, 32(2): 356-367.
[10] Wen Liu, Yan-Ming Shen, Member, CCF, Peng Wang. An Efficient Approach of Processing Multiple Continuous Queries [J]. , 2016, 31(6): 1212-1227.
[11] Chen-Chen Sun, De-Rong Shen, Yue Kou, Tie-Zheng Nie, Ge Yu. Topological Features Based Entity Disambiguation [J]. , 2016, 31(5): 1053-1068.
[12] Yu Zhang, Miao Liu, Hai-Xia Xia. Clustering Context-Dependent Opinion Target Words in Chinese Product Reviews [J]. , 2015, 30(5): 1109-1119.
[13] Jing Zhou, Shan-Feng Zhu, Xiaodi Huang, Yanchun Zhang. Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning [J]. , 2015, 30(4): 859-873.
[14] Dong-Hong Han, Xin Zhang, Guo-Ren Wang. Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine [J]. , 2015, 30(4): 874-887.
[15] Chao Deng, Yi-Ci Cai, Qiang Zhou. Register Clustering Methodology for Low Power Clock Tree Synthesis [J]. , 2015, 30(2): 391-403.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] Feng Yulin;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[5] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[7] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[8] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[9] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[10] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved