利用分布式极限学习机分类不确定演化数据流
Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine
-
摘要: 由于不确定流数据具有潜在概念漂移、大容量、流速快等特点,传统的分类算法已经不再适用,需要设计专门的算法训练高效精确的分类器。在本篇论文中,首先提出了分布式极限学习机(Distributed Extreme Learning Machine, DELM)算法,对传统极限学习机大矩阵运算进行优化以适用于处理大数据集。其次,提出一种在线单次扫描的基于分布式极限学习机的加权集成分类器(Weighted Ensemble Classifier based on Distributed ELM, WE-DELM),用以有效分类带有概念漂移的不确定流数据。其中建立可能世界模型将不确定流数据转化为确定流数据,并利用DELM训练基分类器,基分类器的权重根据分类结果动态更新。WE-DELM算法提高了分类器的学习效率和分类准确性。实验结果表与其它算法相比,WE-DELM算法在效率、准确性等方面具有较好的性能。Abstract: Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.