? 基于标签相似度的众包图片标注的质量控制方法
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (5) :877-889    DOI: 10.1007/s11390-017-1770-7
Special Section on Crowdsourced Data Management << Previous Articles | Next Articles >>
基于标签相似度的众包图片标注的质量控制方法
Yi-Li Fang1,2, Hai-Long Sun1,2,*, Member, CCF, ACM, IEEE, Peng-Peng Chen1,2, Ting Deng1,2, Member, CCF, ACM
1 State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;
2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Improving the Quality of Crowdsourced Image Labeling via Label Similarity
Yi-Li Fang1,2, Hai-Long Sun1,2,*, Member, CCF, ACM, IEEE, Peng-Peng Chen1,2, Ting Deng1,2, Member, CCF, ACM
1 State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;
2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China

摘要
参考文献
相关文章
Download: [PDF 1205KB]  
摘要 机器学习算法是解决图片标注的主要方法,该类方法处理的准确性和训练集(手工标注图片集)极其相关,而众包是获得大量手工图片标注的重要方法。然而,对于一些特殊的图片标注问题,如图片中狗的品种标注,当前的方法难于获得高质量的结果。因此通过优化任务分配和结果汇聚进一步优化工作流来提供标注结果的质量。在任务分配阶段,设计了一种两步众包的方式,其中,利用一种基于信息熵的决策机制来判断是否进行第二步众包。在结果汇聚阶段,基于标签的相似度,提出两种概率图模型的结果汇聚方法来分别进一步提高结果的质量。大量的真实和仿真对比实验说明了本文提出的方法的优越性。
关键词图片标注   众包   信息熵   标签相似度     
Abstract: Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which is especially important for image understanding with supervised machine learning algorithms. However, for several kinds of tasks regarding image labeling, e.g. dog breed recognition, it is hard to achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. In task allocation, we design a two-round crowdsourcing framework, which contains a smart decision mechanism based on information entropy to determine whether to perform a second round task allocation. Regarding result inference, after quantifying the similarity of all labels,two graphical models are proposed to describe the labeling process and corresponding inference algorithms are designed to further improve the result quality of image labeling. Extensive experiments on real-world tasks in Crowdflower and synthesis datasets were conducted. The experimental results demonstrate the superiority of these approaches in comparison with state-of-the-art methods.
KeywordsImage Labeling   Crowdsourcing   Information Entropy   Label Similarity     
Received 2017-03-02;
本文基金:

This work was supported partly by the National Key Research and Development Program of China under Grant No. 2016YFB1000804, the National Natural Science Foundation of China under Grant No. 61602023, the National Basic Research 973 Program of China under Grant Nos. 2014CB340304 and 2015CB358700, and the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-14.

通讯作者: Hai-Long Sun,sunhl@act.buaa.edu.cn     Email: sunhl@act.buaa.edu.cn
About author: Yi-Li Fang is a Ph.D. student in the School of Computer Science and Engineering, Beihang University, Beijing. His research interests mainly include crowd computing/crowdsourcing, social computing, and decision science.
引用本文:   
Yi-Li Fang, Hai-Long Sun, Peng-Peng Chen, Ting Deng.基于标签相似度的众包图片标注的质量控制方法[J]  Journal of Computer Science and Technology , 2017,V32(5): 877-889
Yi-Li Fang, Hai-Long Sun, Peng-Peng Chen, Ting Deng.Improving the Quality of Crowdsourced Image Labeling via Label Similarity[J]  Journal of Computer Science and Technology, 2017,V32(5): 877-889
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1770-7
Copyright 2010 by Journal of Computer Science and Technology