面向众包实体消解的用户专长主题建模

doi:10.1007/s11390-018-1882-8

面向众包实体消解的用户专长主题建模

Modeling Topic-Based Human Expertise for Crowd Entity Resolution

摘要

摘要: 实体消解旨在识别一个消解任务中的两个实体是否指称现实世界的相同事物。众包实体消解结合人类和机器算法来获得实体消解任务的真值。然而，当人类给出不可靠的判断结果时会产生不准确或者错误的结果。先前研究已发现在众包实体消解中正确评估用户的准确率或者专长对于真值推断十分重要。但是，这些研究中的大部分假定用户在所有任务上的专长一致，而忽略了他们可能在不同主题上（例如，音乐和运动）具有不同专长的情况。本文关注语义网领域中的众包实体消解。识别出实体消解任务具有的多个主题，并在不同主题上对用户专长进行建模。此外，利用相似任务聚类来提升主题建模及专长评估。提出了一个概率图模型，其能够在一个统一的框架中计算实体消解任务的相似度，评估用户专长，并推断任务真值。在真实世界和人工合成数据集上的实验结果表明，模型对比目前主流方法，能够在任务真值推断上取得更高的准确率，并且与用户真实的专长更加吻合。

Abstract: Entity resolution (ER) aims to identify whether two entities in an ER task refer to the same real-world thing.Crowd ER uses humans, in addition to machine algorithms, to obtain the truths of ER tasks. However, inaccurate or erroneous results are likely to be generated when humans give unreliable judgments. Previous studies have found that correctly estimating human accuracy or expertise in crowd ER is crucial to truth inference. However, a large number of them assume that humans have consistent expertise over all the tasks, and ignore the fact that humans may have varied expertise on different topics (e.g., music versus sport). In this paper, we deal with crowd ER in the Semantic Web area. We identify multiple topics of ER tasks and model human expertise on different topics. Furthermore, we leverage similar task clustering to enhance the topic modeling and expertise estimation. We propose a probabilistic graphical model that computes ER task similarity, estimates human expertise, and infers the task truths in a unified framework. Our evaluation results on real-world and synthetic datasets show that, compared with several state-of-the-art approaches, our proposed model achieves higher accuracy on the task truth inference and is more consistent with the human real expertise.

HTML全文

参考文献()

施引文献

资源附件()