›› 2013,Vol. 28 ›› Issue (2): 394-400.doi: 10.1007/s11390-013-1339-z

所属专题: Artificial Intelligence and Pattern Recognition

• Special Section on Selected Paper from NPC 2011 • 上一篇    

通信实体间的一种高效的、SPAM鲁棒的邻近度量方法

Joo Hyuk Jeon1, Jihwan Song1, Jeong Eun Kwon2, Yoon Joon Lee1, Member, ACM, IEEE, Man Ho Park3 and Myoung Ho Kim1   

  • 收稿日期:2012-03-05 修回日期:2012-09-29 出版日期:2013-03-05 发布日期:2013-03-05

An Efficient and Spam-Robust Proximity Measure Between Communication Entities

Joo Hyuk Jeon1, Jihwan Song1, Jeong Eun Kwon2, Yoon Joon Lee1, Member, ACM, IEEE, Man Ho Park3 and Myoung Ho Kim1   

  1. 1 Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea;
    2 Biz Solution Team, SK Telecom Information Technology R&D Center, Seoul 100-999, Korea;
    3 Mobile Communication Convergence Research Team, Electronics and Telecommunications Research InstituteDaejeon 305-700, Korea
  • Received:2012-03-05 Revised:2012-09-29 Online:2013-03-05 Published:2013-03-05

电子通信服务提供商根据当地法律必须将通信数据保留一定的的时间。这些保留的通信数据或通信日志被用于各种应用, 如犯罪检测, 病毒式营销, 分析研究等等。许多这些应用都依赖于有效的通信日志分析技术。在本文中, 我们专注于度量两个通信实体之间的近似性, 这是进一步分析通讯记录的基本和重要的一步, 并提出了一种新的邻近度量称为ESP。我们所提出的度量方法只考虑(图理论上)两个实体之间的最短路径, 并对类似spam(spam-like)实体与其它实体之间给出小值。因此, 它不仅是计算上有效的, 而且也是spam鲁棒的。通过在真实的和合成数据集的实验, 显示了我们所提出的邻近度量方法在大多数情况下比现有的方法更加准确, 高效和spam鲁棒性。

Abstract: Electronic communication service providers are obliged to retain communication data for a certain amount of time by their local laws. The retained communication data or the communication logs are used in various applications such as crime detection, viral marketing, analytical study, and so on. Many of these applications rely on effective techniques for analyzing communication logs. In this paper, we focus on measuring the proximity between two communication entities, which is a fundamental and important step toward further analysis of communication logs, and propose a new proximity measure called ESP (Efficient and Spam-Robust Proximity measure). Our proposed measure considers only the (graph- theoretically) shortest paths between two entities and gives small values to those between spam-like entities and others. Thus, it is not only computationally efficient but also spam-robust. By conducting several experiments on real and synthetic datasets, we show that our proposed proximity measure is more accurate, computationally efficient and spam-robust than the existing measures in most cases.

[1] Kotzanikolaou P. Data retention and privacy in electroniccommunications. IEEE Security and Privacy, 2008, 6(5): 46-52.

[2] Canter D, Alison L J. The Social Psychology of Crime:Groups, Teams and Networks. Aldershot, UK: Ashgate, 1999.

[3] Aery M, Chakravarthy S. eMailSift: Email classification basedon structure and content. In Proc. the 15th ICDM, November2005, pp.18-25.

[4] Yu B, Xu Z. A comparative study for content-based dynamicspam classification using four machine learning algorithms.Knowledge-Based Systems, 2008, 21(4): 355-362.

[5] Layfield R, Thuraisingham B, Khan L, Kantarcioglu M. De-sign and implementation of a secure social network system.International Journal of Computer Systems Science & Engi-neering, 2009, 24(2): 71-84.

[6] Song H H, Cho T W, Dave V, Zhang Y, Qiu L. Scalable proxi-mity estimation and link prediction in online social networks.In Proc. the 9th IMC, November 2009, pp.322-335.

[7] Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic mul-timedia crossmodal correlation discovery. In Proc. the 10thSIGKDD, August 2004, pp.653-658.

[8] Sozio M, Gionis A. The community-search problem and howto plan a successful cocktail party. In Proc. the 16thSIGKDD, July 2010, pp.939-948.

[9] Pirmez L, Carmo L F R C, Bacellar L F. Enhancing Leven-shtein distance algorithm for assessing behavioral trust. Int.J. Computer Systems Science & Engineering, 2010, 25(1):5-14.

[10] Tong H, Faloutsos C. Center-piece subgraphs: Problem defi-nition and fast solutions. In Proc. the 12th SIGKDD, August2006, pp.404-413.

[11] Tong H, Faloutsos C, Pan J Y. Random walk with restart:Fast solutions and applications. Knowledge of InformationSystems, 2008, 14(3): 327-346.

[12] Tong H, Qu H, Jamjoom H. Measuring proximity on graphswith side information. In Proc. ICDM, December 2008,pp.598-607.

[13] Koren Y, North S C, Volinsky C. Measuring and extractingproximity graphs in networks. ACM Trans. Knowledge Dis-covery from Data, 2007, 1(3), Article No.12.

[14] Faloutsos C, McCurley K S, Tomkins A. Fast discovery ofconnection subgraphs. In Proc. the 10th SIGKDD, August2004, pp.118-127.

[15] Airoldi E M, Blei D M, Fienberg S E, Xing E P. Mixed mem-bership stochastic blockmodels. Journal of Machine LearningResearch, 2008, 9: 1981-2014.

[16] Kemp C, Tenenbaum J B, Griffiths T L, Yamada T, UedaN. Learning systems of concepts with an infinite relationalmodel. In Proc. the 21st AAAI, July 2006, pp.381-388.

[17] Kubica J, Moore A, Schneider J, Yang Y. Stochastic link andgroup detection. In Proc. the 18th AAAI, July 28-August 1,2002, pp.798-806.

[18] Kurihara K, Kameya Y, Sato T. A frequency-based stochas-tic blockmodel. In Proc. Workshop on Information-BasedInduction Sciences, October 2006.

[19] Lantuejoul C, Maisonneuve F. Geodesic methods in quanti-tative image analysis. Pattern Recognition, 1984, 17(2): 177-187.

[20] Grazzini J, Soille P, Bielskiy C. On the use of geodesic dis-tances for spatial interpolation. In Proc. GeoComputation,September 2007.

[21] Borgatti S P, Everett M G. A graph-theoretic perspective oncentrality. Social Networks, 2006, 28(4): 466-484.

[22] Shetty J, Adibi J. The Enron email dataset database schemaand brief statistical report. Technical Report, InformationSciences Institute, University of Southern California, 2004.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: