基于标签相似度的众包图片标注的质量控制方法

doi:10.1007/s11390-017-1770-7

基于标签相似度的众包图片标注的质量控制方法

Improving the Quality of Crowdsourced Image Labeling via Label Similarity

摘要

摘要: 机器学习算法是解决图片标注的主要方法，该类方法处理的准确性和训练集（手工标注图片集）极其相关，而众包是获得大量手工图片标注的重要方法。然而，对于一些特殊的图片标注问题，如图片中狗的品种标注，当前的方法难于获得高质量的结果。因此通过优化任务分配和结果汇聚进一步优化工作流来提供标注结果的质量。在任务分配阶段，设计了一种两步众包的方式，其中，利用一种基于信息熵的决策机制来判断是否进行第二步众包。在结果汇聚阶段，基于标签的相似度，提出两种概率图模型的结果汇聚方法来分别进一步提高结果的质量。大量的真实和仿真对比实验说明了本文提出的方法的优越性。

Abstract: Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which is especially important for image understanding with supervised machine learning algorithms. However, for several kinds of tasks regarding image labeling, e.g. dog breed recognition, it is hard to achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. In task allocation, we design a two-round crowdsourcing framework, which contains a smart decision mechanism based on information entropy to determine whether to perform a second round task allocation. Regarding result inference, after quantifying the similarity of all labels,two graphical models are proposed to describe the labeling process and corresponding inference algorithms are designed to further improve the result quality of image labeling. Extensive experiments on real-world tasks in Crowdflower and synthesis datasets were conducted. The experimental results demonstrate the superiority of these approaches in comparison with state-of-the-art methods.

HTML全文

参考文献()

施引文献

资源附件()