User Account Linkage Across Multiple Platforms with Location Data

doi:10.1007/s11390-020-0250-7

摘要: 得益于GPS设备和可穿戴移动设备在日常生活中的普及，人们收集到越来越多的位置数据，基于这些数据的跨多平台用户连接近年来得到了广泛的关注。不同于已有的跨双平台的用户连接，本文提出了一个全新的模型ULMP，旨在连接来自三个及以上平台的用户。需要看到的是，虽然该工作对跨平台用户行为分析、跨平台预测等应用具有很强的现实意义，但是完成该研究是非常具有挑战性的。其中最主要的原因是，在跨平台连接的过程中存在用户组合数爆炸的问题。为了解决之一困难，本文首先提出了一个逆剪枝策略GTkNN，以减少算法的搜索空间。然后，提出了一个基于核密度估计的算法评估用户之间的相似度，该算法综合了时间和空间信息。在实验部分，通过在不同数据集上的反复验证，最终实验结果表明本文提出的方法能够获得良好的性能，无论是时间、准确率、召回率，还是可拓展性都好于该领域已有的方法。

Abstract: Linking user accounts belonging to the same user across different platforms with location data has received significant attention, due to the popularization of GPS-enabled devices and the wide range of applications benefiting from user account linkage (e.g., cross-platform user profiling and recommendation). Different from most existing studies which only focus on user account linkage across two platforms, we propose a novel model ULMP (i.e., user account linkage across multiple platforms), with the goal of effectively and efficiently linking user accounts across multiple platforms with location data. Despite of the practical significance brought by successful user linkage across multiple platforms, this task is very challenging compared with the ones across two platforms. The major challenge lies in the fact that the number of user combinations shows an explosive growth with the increase of the number of platforms. To tackle the problem, a novel method GTkNN is first proposed to prune the search space by efficiently retrieving top-k candidate user accounts indexed with well-designed spatial and temporal index structures. Then, in the pruned space, a match score based on kernel density estimation combining both spatial and temporal information is designed to retrieve the linked user accounts. The extensive experiments conducted on four real-world datasets demonstrate the superiority of the proposed model ULMP in terms of both effectiveness and efficiency compared with the state-of-art methods.