›› 2013, Vol. 28 ›› Issue (2): 394-400.doi: 10.1007/s11390-013-1339-z

Special Issue: Artificial Intelligence and Pattern Recognition

• Database and Data Management • Previous Articles    

An Efficient and Spam-Robust Proximity Measure Between Communication Entities

Joo Hyuk Jeon1, Jihwan Song1, Jeong Eun Kwon2, Yoon Joon Lee1, Member, ACM, IEEE, Man Ho Park3 and Myoung Ho Kim1   

  1. 1 Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea;
    2 Biz Solution Team, SK Telecom Information Technology R&D Center, Seoul 100-999, Korea;
    3 Mobile Communication Convergence Research Team, Electronics and Telecommunications Research InstituteDaejeon 305-700, Korea
  • Received:2012-03-05 Revised:2012-09-29 Online:2013-03-05 Published:2013-03-05

Electronic communication service providers are obliged to retain communication data for a certain amount of time by their local laws. The retained communication data or the communication logs are used in various applications such as crime detection, viral marketing, analytical study, and so on. Many of these applications rely on effective techniques for analyzing communication logs. In this paper, we focus on measuring the proximity between two communication entities, which is a fundamental and important step toward further analysis of communication logs, and propose a new proximity measure called ESP (Efficient and Spam-Robust Proximity measure). Our proposed measure considers only the (graph- theoretically) shortest paths between two entities and gives small values to those between spam-like entities and others. Thus, it is not only computationally efficient but also spam-robust. By conducting several experiments on real and synthetic datasets, we show that our proposed proximity measure is more accurate, computationally efficient and spam-robust than the existing measures in most cases.

[1] Kotzanikolaou P. Data retention and privacy in electroniccommunications. IEEE Security and Privacy, 2008, 6(5): 46-52.

[2] Canter D, Alison L J. The Social Psychology of Crime:Groups, Teams and Networks. Aldershot, UK: Ashgate, 1999.

[3] Aery M, Chakravarthy S. eMailSift: Email classification basedon structure and content. In Proc. the 15th ICDM, November2005, pp.18-25.

[4] Yu B, Xu Z. A comparative study for content-based dynamicspam classification using four machine learning algorithms.Knowledge-Based Systems, 2008, 21(4): 355-362.

[5] Layfield R, Thuraisingham B, Khan L, Kantarcioglu M. De-sign and implementation of a secure social network system.International Journal of Computer Systems Science & Engi-neering, 2009, 24(2): 71-84.

[6] Song H H, Cho T W, Dave V, Zhang Y, Qiu L. Scalable proxi-mity estimation and link prediction in online social networks.In Proc. the 9th IMC, November 2009, pp.322-335.

[7] Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic mul-timedia crossmodal correlation discovery. In Proc. the 10thSIGKDD, August 2004, pp.653-658.

[8] Sozio M, Gionis A. The community-search problem and howto plan a successful cocktail party. In Proc. the 16thSIGKDD, July 2010, pp.939-948.

[9] Pirmez L, Carmo L F R C, Bacellar L F. Enhancing Leven-shtein distance algorithm for assessing behavioral trust. Int.J. Computer Systems Science & Engineering, 2010, 25(1):5-14.

[10] Tong H, Faloutsos C. Center-piece subgraphs: Problem defi-nition and fast solutions. In Proc. the 12th SIGKDD, August2006, pp.404-413.

[11] Tong H, Faloutsos C, Pan J Y. Random walk with restart:Fast solutions and applications. Knowledge of InformationSystems, 2008, 14(3): 327-346.

[12] Tong H, Qu H, Jamjoom H. Measuring proximity on graphswith side information. In Proc. ICDM, December 2008,pp.598-607.

[13] Koren Y, North S C, Volinsky C. Measuring and extractingproximity graphs in networks. ACM Trans. Knowledge Dis-covery from Data, 2007, 1(3), Article No.12.

[14] Faloutsos C, McCurley K S, Tomkins A. Fast discovery ofconnection subgraphs. In Proc. the 10th SIGKDD, August2004, pp.118-127.

[15] Airoldi E M, Blei D M, Fienberg S E, Xing E P. Mixed mem-bership stochastic blockmodels. Journal of Machine LearningResearch, 2008, 9: 1981-2014.

[16] Kemp C, Tenenbaum J B, Griffiths T L, Yamada T, UedaN. Learning systems of concepts with an infinite relationalmodel. In Proc. the 21st AAAI, July 2006, pp.381-388.

[17] Kubica J, Moore A, Schneider J, Yang Y. Stochastic link andgroup detection. In Proc. the 18th AAAI, July 28-August 1,2002, pp.798-806.

[18] Kurihara K, Kameya Y, Sato T. A frequency-based stochas-tic blockmodel. In Proc. Workshop on Information-BasedInduction Sciences, October 2006.

[19] Lantuejoul C, Maisonneuve F. Geodesic methods in quanti-tative image analysis. Pattern Recognition, 1984, 17(2): 177-187.

[20] Grazzini J, Soille P, Bielskiy C. On the use of geodesic dis-tances for spatial interpolation. In Proc. GeoComputation,September 2007.

[21] Borgatti S P, Everett M G. A graph-theoretic perspective oncentrality. Social Networks, 2006, 28(4): 466-484.

[22] Shetty J, Adibi J. The Enron email dataset database schemaand brief statistical report. Technical Report, InformationSciences Institute, University of Southern California, 2004.
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved