A semi-random multiple decision-tree algorithm for mining data streams
-
Abstract
Mining with streaming data is a hot topic in data mining. Whenperforming classification on data streams, traditional classificationalgorithms based on decision trees, such as ID3 and C4.5, have arelatively poor efficiency in both time and space due to thecharacteristics of streaming data. There are some advantages in time andspace when using random decision trees. An incremental algorithm formining data streams, SRMTDS (Semi-Random Multiple decision Trees forData Streams), based on random decision trees is proposed in this paper.SRMTDS uses the inequality of Hoeffding bounds to choose the minimumnumber of split-examples, a heuristic method to compute the informationgain for obtaining the split thresholds of numerical attributes, and aNaïve Bayes classifier to estimate the class labels of tree leaves.Our extensive experimental study shows that SRMTDS has an improvedperformance in time, space, accuracy and the anti-noise capability incomparison with VFDTc, a state-of-the-art decision-tree algorithm forclassifying data streams.
-
-