We use cookies to improve your experience with our site.

基于BIRCH集成和局部结构映射的数据流半监督分类

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

  • 摘要: 半监督学习更加符合实际应用场景,聚类算法能捕获数据的内在结构和分布,因此基于聚类的数据流半监督分类问题已经受到了广泛的关注。但是,对于大多数现有方法,标记和未标记的样本很难同时用于检测概念漂移;分类器池中与每个概念相对应的分类器难以同时利用标记和未标记的样本进行增量更新以不断提高其泛化能力。如果算法能准确地检测到概念漂移并能利用来自相同概念的样本持续更新与每个概念相对应的分类器,则具有概念漂移的数据流将会被更准确地分类。因此,本文提出了一种基于BIRCH集成和局部结构映射(SCBELS)的数据流半监督分类算法。在该算法中,迁移学习领域中的局部结构映射策略被用于计算每个无标记样本周围的局部相似度,再与半监督贝叶斯方法结合起来以检测概念漂移;若检测到重现概念,选择分类器池中某个BIRCH集成分类器进行更新,否则训练一个新的BIRCH集成分类器并将其加入分类器池。在真实和人工数据集上的实验结果表明相比于其他的对比算法,SCBELS能较大幅度地提高累积分类准确率。这是因为SCBELS能更准确地检测概念漂移,并能不断提高每个分类器的泛化能力,并且这两者会相互促进。因此,本文提出的算法可以有效地应用于数据流分类。本文的工作还表明了迁移学习的有关算法的确可用于概念漂移检测。但是,为了更好地处理分类器更新和概念漂移检测,SCBELS算法引入了更多参数。如何提升半监督概念漂移检测时间效率将是我们未来的研究方向。

     

    Abstract: Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts. However, the generalization ability for each specific concept cannot be steadily improved, and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts. This paper proposes to solve these problems by BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) ensemble and local structure mapping. The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection. If a recurrent concept is detected, a historical BIRCH ensemble classifier is selected to be incrementally updated; otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool. The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.

     

/

返回文章
返回