›› 2013,Vol. 28 ›› Issue (2): 394-400.doi: 10.1007/s11390-013-1339-z

所属专题: Artificial Intelligence and Pattern Recognition

• Special Section on Selected Paper from NPC 2011 • 上一篇    


Joo Hyuk Jeon1, Jihwan Song1, Jeong Eun Kwon2, Yoon Joon Lee1, Member, ACM, IEEE, Man Ho Park3 and Myoung Ho Kim1   

  • 收稿日期:2012-03-05 修回日期:2012-09-29 出版日期:2013-03-05 发布日期:2013-03-05

An Efficient and Spam-Robust Proximity Measure Between Communication Entities

Joo Hyuk Jeon1, Jihwan Song1, Jeong Eun Kwon2, Yoon Joon Lee1, Member, ACM, IEEE, Man Ho Park3 and Myoung Ho Kim1   

  1. 1 Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea;
    2 Biz Solution Team, SK Telecom Information Technology R&D Center, Seoul 100-999, Korea;
    3 Mobile Communication Convergence Research Team, Electronics and Telecommunications Research InstituteDaejeon 305-700, Korea
  • Received:2012-03-05 Revised:2012-09-29 Online:2013-03-05 Published:2013-03-05

电子通信服务提供商根据当地法律必须将通信数据保留一定的的时间。这些保留的通信数据或通信日志被用于各种应用, 如犯罪检测, 病毒式营销, 分析研究等等。许多这些应用都依赖于有效的通信日志分析技术。在本文中, 我们专注于度量两个通信实体之间的近似性, 这是进一步分析通讯记录的基本和重要的一步, 并提出了一种新的邻近度量称为ESP。我们所提出的度量方法只考虑(图理论上)两个实体之间的最短路径, 并对类似spam(spam-like)实体与其它实体之间给出小值。因此, 它不仅是计算上有效的, 而且也是spam鲁棒的。通过在真实的和合成数据集的实验, 显示了我们所提出的邻近度量方法在大多数情况下比现有的方法更加准确, 高效和spam鲁棒性。

Abstract: Electronic communication service providers are obliged to retain communication data for a certain amount of time by their local laws. The retained communication data or the communication logs are used in various applications such as crime detection, viral marketing, analytical study, and so on. Many of these applications rely on effective techniques for analyzing communication logs. In this paper, we focus on measuring the proximity between two communication entities, which is a fundamental and important step toward further analysis of communication logs, and propose a new proximity measure called ESP (Efficient and Spam-Robust Proximity measure). Our proposed measure considers only the (graph- theoretically) shortest paths between two entities and gives small values to those between spam-like entities and others. Thus, it is not only computationally efficient but also spam-robust. By conducting several experiments on real and synthetic datasets, we show that our proposed proximity measure is more accurate, computationally efficient and spam-robust than the existing measures in most cases.

