We use cookies to improve your experience with our site.

ConfDTree:一种基于统计的决策树改进方法

ConfDTree:Statistical Methods for Improving Decision Trees

  • 摘要: 经典的决策树有着三方面的缺点:当训练集较小时分类性能低下;必须有严格定义的决策准则;单个“异常”属性会带来错误的分类过程。本文提出了ConfDTree(基于信任度的决策树)方法。它利用一种后处理过程使得决策树能够更好的对异常数据进行分类。这种后处理过程统计方法(置信区间和二比例Z检验)容易实现,使得其能被用于任何决策树算法以用来识别难以区分的数据和创建备选决策分支。试验结果表明,本文所提出的后处理方法能够一致显著的提升决策树的预测性能。特别是对于小规模、非均衡或者多类别数据集,其平均能够提高5%到9%的以AUC衡量的性能。

     

    Abstract: Decision trees have three main disadvantages: reduced performance when the training set is small; rigid decision criteria; and the fact that a single "uncharacteristic" attribute might "derail" the classification process. In this paper we present ConfDTree (Confidence-Based Decision Tree)——a post-processing method that enables decision trees to better classify outlier instances. This method, which can be applied to any decision tree algorithm, uses easy-to-implement statistical methods (confidence intervals and two-proportion tests) in order to identify hard-to-classify instances and to propose alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%~9% in the AUC performance is reported.

     

/

返回文章
返回