Journal of Computer Science and Technology

   

When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading

Yang-Su Liu (刘洋溯), Zhen-Zhe Zheng (郑臻哲), Member, CCF, IEEE, Fan Wu (吴帆), Member, CCF, IEEE, and Gui-Hai Chen (陈贵海), Member, CCF, Fellow, IEEE   

  1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Received:2022-05-21 Revised:2023-02-13 Accepted:2023-03-15
  • Contact: Zhen-Zhe Zheng E-mail:zhengzhenzhe@sjtu.edu.cn
  • About author:Zhen-Zhe Zheng is an assistant professor in the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai. He received the B.E. in Software Engineering from Xidian University, in 2012, and the M.S. degree and the Ph.D. degree in Computer Science from Shanghai Jiao Tong University, Shanghai, in 2015 and 2018, respectively. He has visited the University of Illinois at Urbana-Champaign (UIUC) as a Visiting Scholar from 2016 to 2018, and then a Post Doc Research Associate from 2018 to 2019. His research interests include game theory, networking and mobile computing, and online marketplaces. He is a recipient of the China Computer Federation (CCF) Excellent Doctoral Dissertation Award 2018, Google Ph.D. Fellowship 2015 and Microsoft Research Asia Ph.D. Fellowship 2015. He has served as the member of technical program committees of several academic conferences, such as MobiHoc, AAAI, IoTDI, MSN, etc. He is a member of the ACM, IEEE, and CCF. For more information, please visit https://zhengzhenzhe220.github.io/.

Large quantity and high quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data market. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley value mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms the other existing methods.


中文摘要

1、 研究背景(context)
数据被喻为电子信息经济时代的“石油”,对人工智能在各种领域的成功应用起着关键的作用。然而日益显著的“数据孤岛”现象导致多方数据难以流通共享,成为人工智能进一步广泛应用的瓶颈。为打破这个困境,最近兴起的数据市场试图通过量化数据的价值来促进数据在自由市场的交换和流通。众包网络作为数据市场中有效收集大量高价值数据的重要方法之一,也需要统一的数据估值指标来衡量数据的采集效率。然而,现目前在数据市场和众包网络中对数据的估值指标大多是基于数据的固有属性,例如数据集大小,数据质量,数据采集成本等。这些估值方法忽略了数据价值随着应用场景不同而急剧变化的特性,缺乏一套公平的数据估值指标来促进数据的高效交换。
2、 目的(Objective)
在本文中,我们旨在提出一个新的具有公平性的数据估值指标,并设计一套应用在众包网络和数据市场中的数据采集和报酬分配机制。
3、 方法(Method)
我们通过将数据整合到模型训练中,提出利用训练好的贝叶斯机器学习模型的不确定性作为新的数据估值指标,并用其衡量数据采集效率。基于新的数据指标,我们设计了一套在数据市场的数据采集与报酬分配机制。我们首先设计了一个具有常数近似比的贪心算法,以贪心地选择最具成本效益的数据经纪人,以及一个基于夏普利值的公平报酬分配规则,并且修改数据采集算法以满足个体理性的约束。我们进一步为众包网络提出了一种公平的报酬分配方法,以奖励不同努力水平的数据拥有者。
4、 结果(Result & Findings)
我们理论上了证明我们提出的数据采集的近似算法具有常数近似比,同时证明了报酬分配机制满足公平性与个体理性。我们在两个真实数据集上对我们提出的机制进行了广泛的实验,评估结果还表明,我们的数据采集算法可以接近最优解,并优于现有方法。同时也证明了我们提出的报酬分配机制具有公平性。
5、 结论(Conclusions)
研究结果表明我们提出的数据采购与报酬分配机制具有良好的理论性质,解决了在个人理性约束下的高效数据采购和公平收益分配的联合问题。同时我们提出的数据估值指标能够较好地反应数据在特定应用场景下的价值和质量。

Key words: data trading; crowdsourcing; mechanism design; shapley value;

[1] Sai-Sai Gong, Wei Hu, Wei-Yi Ge, Yu-Zhong Qu. Modeling Topic-Based Human Expertise for Crowd Entity Resolution [J]. Journal of Computer Science and Technology, 2018, 33(6): 1204-1218.
[2] Peng-Peng Chen, Hai-Long Sun, Yi-Li Fang, Jin-Peng Huai. Collusion-Proof Result Inference in Crowdsourcing [J]. , 2018, 33(2): 351-365.
[3] An-Zhen Zhang, Jian-Zhong Li, Hong Gao, Yu-Biao Chen, Heng-Zhao Ma, Mohamed Jaward Bah. CrowdOLA: Online Aggregation on Duplicate Data Powered by Crowdsourcing [J]. , 2018, 33(2): 366-379.
[4] Yi-Li Fang, Hai-Long Sun, Peng-Peng Chen, Ting Deng. Improving the Quality of Crowdsourced Image Labeling via Label Similarity [J]. , 2017, 32(5): 877-889.
[5] Hong-Zhi Wang, Zhi-Xin Qi, Ruo-Xi Shi, Jian-Zhong Li, Hong Gao. COSSET+:Crowdsourced Missing Value Imputation Optimized by Knowledge Base [J]. , 2017, 32(5): 845-857.
[6] Zhi-Xu Li, Qiang Yang, An Liu, Guan-Feng Liu, Jia Zhu, Jia-Jie Xu, Kai Zheng, Min Zhang. Crowd-Guided Entity Matching with Consolidated Textual Data [J]. , 2017, 32(5): 858-876.
[7] An Liu, Zhi-Xu Li, Guan-Feng Liu, Kai Zheng, Min Zhang, Qing Li, Xiangliang Zhang. Privacy-preserving Task Assignment in Spatial Crowdsourcing [J]. , 2017, 32(5): 905-918.
[8] Jia-Xu Liu, Yu-Dian Ji, Wei-Feng Lv, Ke Xu. Budget-aware Dynamic Incentive Mechanism in Spatial Crowdsourcing [J]. , 2017, 32(5): 890-904.
[9] Hai-Bo Ye, Tao Gu, Xian-Ping Tao, Jian Lv. Infrastructure-Free Floor Localization Through Crowdsourcing [J]. , 2015, 30(6): 1249-1273.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved