›› 2018,Vol. 33 ›› Issue (2): 351-365.doi: 10.1007/s11390-018-1823-6

所属专题: Data Management and Data Mining

• Data Management and Data Mining • 上一篇    下一篇

基于串谋检测的众包结果推理方法

Peng-Peng Chen1,2, Student Member, CCF, ACM, Hai-Long Sun1,2*, Member, CCF, ACM, IEEE, Yi-Li Fang1,2*, Member, CCF, ACM, Jin-Peng Huai1,2, Fellow, CCF, Member, ACM, IEEE   

  1. 1 State Key Laboratory of Software Development Environment, School of Computer Science and Engineering Beihang University, Beijing 100191, China;
    2 Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing 100191, China
  • 收稿日期:2017-04-17 修回日期:2018-01-29 出版日期:2018-03-05 发布日期:2018-03-05
  • 通讯作者: Hai-Long Sun E-mail:sunhl@buaa.edu.cn
  • 作者简介:Peng-Peng Chen is a Ph.D. student in the School of Computer Science and Engineering, Beihang University, Beijing. His research interests mainly include crowd computing/crowdsourcing, and social computing. He is a student member of CCF and ACM
  • 基金资助:

    This work was supported partly by the National Basic Research 973 Program of China under Grant Nos. 2015CB358700 and 2014CB340304, the National Natural Science Foundation of China under Grant No. 61421003, and the Open Fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-14.

Collusion-Proof Result Inference in Crowdsourcing

Peng-Peng Chen1,2, Student Member, CCF, ACM, Hai-Long Sun1,2*, Member, CCF, ACM, IEEE, Yi-Li Fang1,2*, Member, CCF, ACM, Jin-Peng Huai1,2, Fellow, CCF, Member, ACM, IEEE   

  1. 1 State Key Laboratory of Software Development Environment, School of Computer Science and Engineering Beihang University, Beijing 100191, China;
    2 Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing 100191, China
  • Received:2017-04-17 Revised:2018-01-29 Online:2018-03-05 Published:2018-03-05
  • Contact: Hai-Long Sun E-mail:sunhl@buaa.edu.cn
  • About author:Peng-Peng Chen is a Ph.D. student in the School of Computer Science and Engineering, Beihang University, Beijing. His research interests mainly include crowd computing/crowdsourcing, and social computing. He is a student member of CCF and ACM
  • Supported by:

    This work was supported partly by the National Basic Research 973 Program of China under Grant Nos. 2015CB358700 and 2014CB340304, the National Natural Science Foundation of China under Grant No. 61421003, and the Open Fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-14.

在众包中,通常考虑工人独立处理任务并且提交答案,从而确保答案的多样性。事实上,当前研究表明来自通用平台的工人之间存在隐式的协作关系。工人为了付出少量的劳动获取更多的报酬,可能提供重复的答案进行串谋。该种行为会严重损害最终众包结果的质量。然而存在的众包方法均没有考虑到串谋对众包结果推理的影响。因此本文提出一种基于串谋检测的众包结果推理方法。利用工人表现的变化率,通过计算删除重复答案之前和之后工人平均表现的差异来检测串谋行为产生的重复答案,并考虑入结果推理方法中,以确保汇聚结果的质量。基于众包平台的真实数据和仿真数据进行了大量的实验评估。实验结果表明了本文方法的优越性。

Abstract: In traditional crowdsourcing, workers are expected to provide independent answers to tasks so as to ensure the diversity of answers. However, recent studies show that the crowd is not a collection of independent workers, but instead that workers communicate and collaborate with each other. To pursue more rewards with little effort, some workers may collude to provide repeated answers, which will damage the quality of the aggregated results. Nonetheless, there are few efforts considering the negative impact of collusion on result inference in crowdsourcing. In this paper, we are specially concerned with the Collusion-Proof result inference problem for general crowdsourcing tasks in public platforms. To that end, we design a metric, the worker performance change rate, to identify the colluded answers by computing the difference of the mean worker performance before and after removing the repeated answers. Then we incorporate the collusion detection result into existing result inference methods to guarantee the quality of the aggregated results even with the occurrence of collusion behaviors. With real-world and synthetic datasets, we conducted an extensive set of evaluations of our approach. The experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.

[1] Li G L, Wang J N, Zheng Y D, Franklin M J. Crowdsourced data management:A survey. IEEE Trans. Knowledge and Data Engineering, 2016, 28(9):2296-2319.

[2] Chen L, Lee D, Milo T. Data-driven crowdsourcing:Management, mining, and applications. In Proc. the 31st Int. Conf. Data Engineering, April 2015, pp.1527-1529.

[3] Deng J, Dong W, Socher R et al. ImageNet:A large-scale hierarchical image database. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009, pp.248-255.

[4] Liu X, Lu M Y, Ooi B C, Shen Y Y, Wu S, Zhang M H. CDAS:A crowdsourcing data analytics system. Proceedings of the VLDB Endowment, 2012, 5(10):1040-1051.

[5] Fang Y L, Sun H L, Li G L, Zhang R C, Huai J P. Effective result inference for context-sensitive tasks in crowdsourcing. In Proc. the 21st Int. Conf. Database Systems for Advanced Applications, April 2016, pp.33-48.

[6] von Ahn L, Maurer B, McMillen C, Abraham D, Blum M. reCAPTCHA:Human-based character recognition via web security measures. Science, 2008, 321(5895):1465-1468.

[7] Fang Y L, Sun H L, Zhang R C, Huai J P, Mao Y Y. A model for aggregating contributions of synergistic crowdsourcing workflows. In Proc. the 28th AAAI Conf. Artificial Intelligence, July 2014, pp.3102-3103.

[8] Zaidan O F, Callison-Burch C. Crowdsourcing translation:Professional quality from non-professionals. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics, June 2011, pp.1220-1229.

[9] Bernstein M S, Little G, Miller R C, Hartmann B, Ackerman M S, Karger D R, Crowell D, Panovich K. Soylent:A word processor with a crowd inside. Communications of the ACM, 2015, 58(8):85-94.

[10] Zhu Y S, Yue S C, Yu C, Shi Y C. CEPT:Collaborative editing tool for non-native authors. In Proc. ACM Conf. Computer Supported Cooperative Work and Social Computing, February 25-March 1, 2017, pp.273-285.

[11] Nebeling M, To A, Guo A H, De Freitas A A, Teevan J, Dow S P, Bigham J P. WearWrite:Crowd-assisted writing from smartwatches. In Proc. CHI Conf. Human Factors in Computing Systems, May 2016, pp.3834-3846.

[12] Gray M L, Suri S, Ali S S, Kulkarni D. The crowd is a collaborative network. In Proc. the 19th ACM Conf. ComputerSupported Cooperative Work & Social Computing, February 27-March 2, 2016, pp.134-147.

[13] Yin M, Gray M L, Suri S, Vaughan J W. The communication network within the crowd. In Proc. the 25th Int. Conf. World Wide Web, April 2016, pp.1293-1303.

[14] Salehi N, McCabe A, Valentine M, Bernstein M. Huddler:Convening stable and familiar crowd teams despite unpredictable availability. In Proc. ACM Conf. Computer Supported Cooperative Work and Social Computing, February 25-March 1, 2017, pp.1700-1713.

[15] Gadiraju U, Kawase R, Dietze S, Demartini G. Understanding malicious behavior in crowdsourcing platforms:The case of online surveys. In Proc. the 33rd Annual ACM Conf. Human Factors in Computing Systems, April 2015, pp.1631-1640.

[16] Sodré I, Brasileiro F. An analysis of the use of qualifications on the Amazon mechanical Turk online labor market. Computer Supported Cooperative Work, 2017, 26(4/5/6):837-872.

[17] Chang J C, Amershi S, Kamar E. Revolt:Collaborative crowdsourcing for labeling machine learning datasets. In Proc. CHI Conf. Human Factors in Computing Systems, May 2017, pp.2334-2346.

[18] Wang G, Wilson C, Zhao X H, Zhu Y B, Mohanlal M, Zheng H T, Zhao B Y. Serf and turf:Crowdturfing for fun and profit. In Proc. the 21st Int. Conf. World Wide Web, April 2012, pp.679-688.

[19] Adams S A. Maintaining the collision of accounts:Crowdsourcing sites in health care as brokers in the co-production of pharmaceutical knowledge. Information Communication & Society, 2014, 17(6):657-669.

[20] Douceur J R. The Sybil attack. In Proc. the 1st Int. Workshop on Peer-to-Peer Systems, March 2002, pp.251-260.

[21] Lev O, Polukarov M, Bachrach Y, Rosenschein J S. Mergers and collusion in all-pay auctions and crowdsourcing contests. In Proc. Int. Conf. Autonomous Agents and MultiAgent Systems, May 2013, pp.675-682.

[22] KhudaBukhsh A R, Carbonell J G, Jansen P J. Detecting non-adversarial collusion in crowdsourcing. In Proc. the 2nd AAAI Conf. Human Computation and Crowdsourcing, November 2014, pp.104-111.

[23] Xiang Q K, Nevat I, Zhang P F, Zhang J. Collusionresistant spatial phenomena crowdsourcing via mixture of Gaussian processes regression. In Proc. the 18th Int. Conf. Trust in Agent Societies, May 2016, pp.30-41.

[24] Fang Y L, Chen P P, Sun K, Sun H L. A decision tree based quality control framework for multi-phase tasks in crowdsourcing. In Proc. the 12th Chinese Conf. Computer Supported Cooperative Work and Social Computing, September 2017, pp.10-17.

[25] Fang Y L, Sun H L, Chen P P, Deng T. Improving the quality of crowdsourced image labeling via label similarity. Journal of Computer Science and Technology, 2017, 32(5):877-889.

[26] Sheng V S, Provost F, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. the 14th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2008, pp.614-622.

[27] Snow R, O'Connor B, Jurafsky D, Ng A Y. Cheap and fast-but is it good?:Evaluating non-expert annotations for natural language tasks. In Proc. Conf. Empirical Methods in Natural Language Processing, October 2008, pp.254-263.

[28] Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society, 1979, 28(1):20-28.

[29] Raykar V C, Yu S P, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11:1297-1322.

[30] Gao C, Lu Y, Zhou D Y. Exact exponent in optimal rates for crowdsourcing. In Proc. the 33rd Int. Conf. Machine Learning, June 2016, pp.603-611.

[31] Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more:Optimal integration of labels from labelers of unknown expertise. In Proc. the 22nd Int. Conf. Neural Information Processing Systems, December 2009, pp.2035-2043.

[32] Garcia-Molina H, Joglekar M, Marcus A, Parameswaran A, Verroios V. Challenges in data crowdsourcing. IEEE Trans Knowledge and Data Engineering, 2016, 28(4):901-911.

[33] Shin H, Park T, Kang S, Lee B, Song J, Chon Y, Cha H. CoSMiC:Designing a mobile crowd-sourced collaborative application to find a missing child in situ. In Proc. the 16th Int. Conf. Human-Computer Interaction with Mobile Devices & Services, September 2014, pp.389-398.

[34] Ambati V, Vogel S, Carbonell J. Collaborative workflow for crowdsourcing translation. In Proc. ACM Conf. Computer Supported Cooperative Work, February 2012, pp.1191-1194.

[35] Teevan J, Iqbal S T, Von Veh C. Supporting collaborative writing with microtasks. In Proc. CHI Conf. Human Factors in Computing Systems, May 2016, pp.2657-2668.

[36] Rahman H, Roy S B, Thirumuruganathan S, Amer-Yahia S, Das G. Task assignment optimization in collaborative crowdsourcing. In Proc. IEEE Int. Conf. Data Mining, November 2015, pp.949-954.

[37] Torshiz M N, Amintoosi H. Collusion-resistant worker selection in social crowdsensing systems. Journal of Computer and Knowledge Engineering, 2017, 1(1):9-20.

[38] Celis L E, Reddy S P, Singh I P, Vaya S. Assignment techniques for crowdsourcing sensitive tasks. In Proc. the 19th ACM Conf. Computer-Supported Cooperative Work & Social Computing, February 27-March 2, 2016, pp.836-847.

[39] Wang L, Zhou Z H. Cost-saving effect of crowdsourcing learning. In Proc. the 25th Int. Joint Conf. Artificial Intelligence, July 2016, pp.2111-2117.

[40] Welinder P, Branson S, Belongie S, Perona P. The multidimensional wisdom of crowds. In Proc. the 23rd Int. Conf. Neural Information Processing Systems, December 2010, pp.2424-2432.

[41] Ipeirotis P G, Provost F, Wang J. Quality management on Amazon Mechanical Turk. In Proc. ACM SIGKDD Workshop on Human Computation, July 2010, pp.64-67.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: