We use cookies to improve your experience with our site.

COSSET+:基于知识库的众包缺失值填充优化方法

COSSET+:Crowdsourced Missing Value Imputation Optimized by Knowledge Base

  • 摘要: 基于众包的缺失值填充是数据清洗领域中的一个新兴方法,该方法可以有效填充一些自动修复技术无法填充的值。然而,众包的时间代价和花费开销都很高。因此,我们需要在保证填充准确率的前提下,降低众包填充法的代价。在本文,我们提出了将知识库过滤器和众包平台的优势结合,从而对缺失值进行填充的方法。因为众包值的个数会影响COSSET+的代价,所以我们的目标是选择出部分适用于众包填充的值。我们证明了众包值选择问题是NP难问题,并给出了该问题的一个近似算法。大量实验验证了我们所提出方法的高效性和准确性。

     

    Abstract: Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, time cost and overhead in crowdsourcing are high. Therefore, we have to reduce cost and guarantee accuracy of crowdsourced imputation. To achieve the optimization goal, we present COSSET+, a crowdsourced framework optimized by knowledge base. We combine the advantages of both knowledge-based filter and crowdsourcing platform to capture missing values. Since the amount of crowd values will affect the cost of COSSET+, we aim to select partial missing values to be crowdsourced. We prove that the crowd value selection problem is an NP-hard problem and develop an approximation algorithm for this problem. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches.

     

/

返回文章
返回