? COSSET+:基于知识库的众包缺失值填充优化方法
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (5) :845-857    DOI: 10.1007/s11390-017-1768-1
Special Section on Crowdsourced Data Management << Previous Articles | Next Articles >>
COSSET+:基于知识库的众包缺失值填充优化方法
Hong-Zhi Wang, Member, CCF, ACM, IEEE, Zhi-Xin Qi, Ruo-Xi Shi, Jian-Zhong Li, Fellow, CCF, Member, ACM, Hong Gao, Senior Member, CCF, Member, ACM
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
COSSET+:Crowdsourced Missing Value Imputation Optimized by Knowledge Base
Hong-Zhi Wang, Member, CCF, ACM, IEEE, Zhi-Xin Qi, Ruo-Xi Shi, Jian-Zhong Li, Fellow, CCF, Member, ACM, Hong Gao, Senior Member, CCF, Member, ACM
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

摘要
参考文献
相关文章
Download: [PDF 1383KB]  
摘要 基于众包的缺失值填充是数据清洗领域中的一个新兴方法,该方法可以有效填充一些自动修复技术无法填充的值。然而,众包的时间代价和花费开销都很高。因此,我们需要在保证填充准确率的前提下,降低众包填充法的代价。在本文,我们提出了将知识库过滤器和众包平台的优势结合,从而对缺失值进行填充的方法。因为众包值的个数会影响COSSET+的代价,所以我们的目标是选择出部分适用于众包填充的值。我们证明了众包值选择问题是NP难问题,并给出了该问题的一个近似算法。大量实验验证了我们所提出方法的高效性和准确性。
关键词众包   缺失值   填充   知识库   优化     
Abstract: Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, time cost and overhead in crowdsourcing are high. Therefore, we have to reduce cost and guarantee accuracy of crowdsourced imputation. To achieve the optimization goal, we present COSSET+, a crowdsourced framework optimized by knowledge base. We combine the advantages of both knowledge-based filter and crowdsourcing platform to capture missing values. Since the amount of crowd values will affect the cost of COSSET+, we aim to select partial missing values to be crowdsourced. We prove that the crowd value selection problem is an NP-hard problem and develop an approximation algorithm for this problem. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches.
Keywordscrowdsourcing   missing value   imputation   knowledge base   optimization     
Received 2017-04-01;
本文基金:

This work was supported by the National Natural Science Foundation of China under Grant Nos. U1509216 and 61472099, the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No. 2015BAH10F01, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province of China under Grant No. LC2016026, and MOE-Microsoft Key Laboratory of Natural Language Processing and Speech of Harbin Institute of Technology.

About author: Hong-Zhi Wang is a professor and doctoral supervisor of Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2008. He was awarded Microsoft Fellowship, Chinese Excellent Database Engineer, and IBM Ph.D. Fellowship. His research interests include big data management, data quality, and graph data management.
引用本文:   
Hong-Zhi Wang, Zhi-Xin Qi, Ruo-Xi Shi, Jian-Zhong Li, Hong Gao.COSSET+:基于知识库的众包缺失值填充优化方法[J]  Journal of Computer Science and Technology , 2017,V32(5): 845-857
Hong-Zhi Wang, Zhi-Xin Qi, Ruo-Xi Shi, Jian-Zhong Li, Hong Gao.COSSET+:Crowdsourced Missing Value Imputation Optimized by Knowledge Base[J]  Journal of Computer Science and Technology, 2017,V32(5): 845-857
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1768-1
Copyright 2010 by Journal of Computer Science and Technology