We use cookies to improve your experience with our site.

基于用户-路标相似索引的增量式跨社交网络用户识别

Incremental User Identification Across Social Networks Based on User-Guider Similarity Index

  • 摘要: 跨社交网络用户识别是社交网络分析的一项重要任务,是数据挖掘、机器学习等领域的研究热点。探索和掌握不同社交网络中用户间的对应关系将有利于各平台间信息的整合,提供更加智能和优质的服务,充分发掘社交网络的价值。
    目前,跨社交网络用户识别方法可以分为:基于属性的用户识别、基于拓扑结构的用户识别和混合式用户识别。然而,现有的基于属性和基于拓扑结构的用户识别方法依赖于特定的模式或种子,如果所需的数据模式缺失或种子数量不足,这些方法将变得不可用。混合式用户识别方法虽然考虑了多种用户特征,提高了识别精度,但现有的方法仅针对静态网络,而忽略了用户之间的动态交互,导致产生一些过时的识别结果。
    为此,我们提出了一种基于用户-路标相似索引的增量式跨社交网络用户识别方法(CURIOUS),该方法可以有效地识别用户,并能适应社交网络的动态变化。首先,提出了一种用户-路标相似索引(USI),该索引不依赖于特定的模式或哈希函数,能够快速定位与给定用户相似的用户。其次,提出了一种两阶段用户识别策略,第一阶段为基于USI的双向用户匹配,采用区间重叠、相似度缩放及回溯三种策略,来解决误差放大问题;第二阶段为基于种子的用户匹配,通过将种子用户之间的映射关系传播到社交网络中未识别的用户对,来扩充第一阶段产生的匹配结果。第三,分别提出了对USI和识别结果的增量式维护策略,能够动态捕捉社交网络的即时状态。最后,本文选取三个真实的社交网络作为实验数据集,将本文方法与传统方法在用户识别的有效性和性能上分别进行了比较。实验结果表明,由于本文方法同时考虑了用户属性和结构特征,并借助USI加快用户匹配速度,与传统方法相比,本文方法在查准率、查全率和排序分指标上分别提高了0.19、0.16和0.09,同时减少了81%的时间代价。
    目前,本文分两个阶段进行跨社交网络用户识别。下一步,我们将考虑这两个阶段间的相互作用,针对协同式的多任务执行进行深入地研究。
    我们希望文中提到的方法和技术对于消除信息孤岛具有一定的参考价值。

     

    Abstract: Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions. However, existing techniques rely on given user seeds and ignore the dynamic changes of online social networks, which fails to generate high quality identification results. In order to solve this problem, we propose an incremental user identification method based on user-guider similarity index (called CURIOUS), which efficiently identifies users and well captures the changes of user features over time. Specifically, we first construct a novel user-guider similarity index (called USI) to speed up the matching between users. Second we propose a two-phase user identification strategy consisting of USI-based bidirectional user matching and seed-based user matching, which is effective even for incomplete networks. Finally, we propose incremental maintenance for both USI and the identification results, which dynamically captures the instant states of social networks. We conduct experimental studies based on three real-world social networks. The experiments demonstrate the effectiveness and the efficiency of our proposed method in comparison with traditional methods. Compared with the traditional methods, our method improves precision, recall and rank score by an average of 0.19, 0.16 and 0.09 respectively, and reduces the time cost by an average of 81%.

     

/

返回文章
返回