›› 2015, Vol. 30 ›› Issue (5): 1097-1108.doi: 10.1007/s11390-015-1585-3

Special Issue: Artificial Intelligence and Pattern Recognition; Data Management and Data Mining

• Special Section on Social Media Processing • Previous Articles     Next Articles

Anomaly Detection in Microblogging via Co-Clustering

Wu Yang(杨武), Senior Member, CCF, Member, ACM Guo-Wei Shen*(申国伟), Student Member, CCF, Member, ACM, Wei Wang(王巍), Liang-Yi Gong(宫良一) Miao Yu(于淼), Guo-Zhong Dong(董国忠)   

  1. Information Security Research Center, Harbin Engineering University, Harbin 150001, China
  • Received:2014-11-17 Revised:2015-07-12 Online:2015-09-05 Published:2015-09-05
  • Contact: Guo-Wei Shen E-mail:shenguowei@hrbeu.edu.cn
  • About author:Wu Yang received his Ph.D. degree in computer system architecture from Harbin Institute of Technology, Harbin, in 2005. He is currently a professor and doctoral supervisor of Harbin Engineering University. His main research interests include data mining, information security and wireless sensor network. He is a senior member of CCF and a member of ACM.
  • Supported by:

    This work was supported by the National Natural Science Foundation of China under Grant No. 61170242, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA012802, and the Fundamental Research Funds for the Central Universities of China under Grant No. HEUCF100605.

Traditional anomaly detection on microblogging mostly focuses on individual anomalous users or messages. Since anomalous users employ advanced intelligent means, the anomaly detection is greatly poor in performance. In this paper, we propose an innovative framework of anomaly detection based on bipartite graph and co-clustering. A bipartite graph between users and messages is built to model the homogeneous and heterogeneous interactions. The proposed coclustering algorithm based on nonnegative matrix tri-factorization can detect anomalous users and messages simultaneously. The homogeneous relations modeled by the bipartite graph are used as constraints to improve the accuracy of the coclustering algorithm. Experimental results show that the proposed scheme can detect individual and group anomalies with high accuracy on a Sina Weibo dataset.

[1] Takahashi T, Tomioka R, Yamanishi K. Discovering emerging topics in social streams via link-anomaly detection. IEEE Trans. Knowledge and Data Engineering, 2014, 26(1):120-130.

[2] Guille A, Favre C. Mention-anomaly-based event detection and tracking in Twitter. In Proc. the IEEE International Conference on Advances in Social Network Analysis and Mining, August 2014, pp.375-382

[3] Savage D, Zhang X, Yu X et al. Anomaly detection in online social networks. Social Networks, 2014, 39:62-70.

[4] O'Callaghan D, Harrigan M, Carthy J et al. Network analysis of recurring YouTube spam campaigns. In Proc. the 6th AAAI Conference on Weblogs and Social Media, June 2012, pp.531-534.

[5] Gao H, Hu J, Huang T et al. Security issues in online social networks. IEEE Internet Computing, 2011, 15(4):56-63.

[6] Zhu Y, Wang X, Zhong E et al. Discovering spammers in social networks. In Proc. the 26th AAAI Conference on Artificial Intelligence, July 2012, pp.171-177.

[7] Kwak H, Lee C, Park H et al. What is Twitter, a social network or a news media? In Proc. the 19th WWW, April 2010, pp.591-600.

[8] Wu S, Hofman J M, Mason W A et al. Who says what to whom on Twitter. In Proc. the 20th WWW, Match 28-April 1, 2011, pp.705-714.

[9] Yu L, Asur S, Huberman B A. What trends in Chinese social media. In Proc. the 5th SNA-KDD Workshop, August 2011.

[10] Gao Q, Abel F, Houben G et al. A comparative study of users' microblogging behavior on Sina Weibo and Twitter. In Lecture Notes in Computer Science 7379, Masthoff J, Mobasher B, Desmarais M C et al. (eds.), Springer Berlin Heidelberg, 2012, pp.88-101.

[11] McCord M, Chuah M. Spam detection on Twitter using traditional classifiers. In Lecture Notes in Computer Science 6906, Alcaraz Calero J M, Yang L T, Mármol F G et al. (eds.), Springer Berlin Heidelberg, 2011, pp.175-186.

[12] Martinez-Romo J, Araujo L. Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications:An International Journal, 2013, 40(8):2992-3000.

[13] Bosma M, Meij E, Weerkamp W. A framework for unsupervised spam detection in social networking sites. In Lecture Notes in Computer Science 7224, Baeza-Yates R, de Vries A P, Zaragoza H et al. (eds.), Springer Berlin Heidelberg, 2012, pp.364-375.

[14] Altshuler Y, Fire M, Shmueli E et al. Detecting anomalous behaviors using structural properties of social networks. In Proc. the 6th International Conference on Social Computing, Behavioral Cultural Modeling and Prediction, April 2013, pp.433-440.

[15] Zhang Q, Ma H, QianW et al. Duplicate detection for identifying social spam in microblogs. In Proc. the 2nd IEEE International Congress on Big Data, June 27-July 2, 2013, pp.141-148.

[16] Chu Z, Widjaja I, Wang H. Detecting social spam campaigns on Twitter. In Lecture Notes in Computer Science 7341, Bao F, Samarati P, Zhou J (eds.), Springer Berlin Heidelberg, 2012, pp.455-472.

[17] Jiang J, Wilson C, Wang X et al. Understanding latent interactions in online social networks. In Proc. the 10th ACM SIGCOMM Conference on Internet Measurement, November 2010, pp.369-382.

[18] Chen Y, Wang L, Dong M. Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans. Knowledge and Data Engineering, 2010, 22(10):1459-1474.

[19] Tang L,Wang X F, Liu H. Community detection via heterogeneous interaction analysis. Data Mining and Knowledge Discovery, 2012, 25(1):1-33.

[20] Hu X, Tang J L, Zhang Y C et al. Social spammer detection in microblogging. In Proc. the 23rd International Joint Conference on Artificial Intelligence, August 2013, pp.2633-2639.

[21] Hu X, Tang J L, Liu H. Online social spammer detection. In Proc. the 28th AAAI Conference on Artificial Intelligence, July 2014, pp.59-65.

[22] Dai H, Zhu F, Lim E et al. Detecting anomalies in bipartite graphs with mutual dependency principles. In Proc. the 12th ICDM, December 2012, pp.171-180.

[23] Sun J, Qu H, Chakrabarti D et al. Neighborhood formation and anomaly detection in bipartite graphs. In Proc. the 5th ICDM, Nov. 2005, pp.418-425.

[24] Akoglu L, Tong H, Koutra D. Graph based anomaly detection and description:A survey. Data Mining and Knowledge Discovery, 2014, 29(3):626-688.

[25] Zhao B, Ji G, QuW et al. Detecting spam community using retweeting relationships-A study on Sina microblog. In Lecture Notes in Computer Science 8178, Cao L, Motoda H, Srivastava J et al. (eds.), Springer International Publishing, 2013, pp.178-190.

[26] Bhat S Y, Abulaish M. Community-based features for identifying spammers in online social networks. In Proc. the 2013 IEEE International Conference on Advances in Social Networks Analysis and Mining, August 2013, pp.100-107.

[27] Yu R, He X R, Liu Y. GLAD:Group anomaly detection in social media analysis. In Proc. the 20th ACM SIGKDD KDD, August 2014, pp.372-381.

[28] Xing E P, Ng A Y, Jordan M I et al. Distance metric learning, with application to clustering with side-information. In Proc. the 16th Neural Information Processing Systems, December 2002, pp.505-512.

[29] Wang H, Nie F P, Huang H. Robust distance metric learning via simultaneous l1-norm minimization and maximization. In Proc. the 31st International Conference on Machine Learning, June 2014, pp.1836-1844.

[30] Chang C C, Lin C J. LIBSVM:A library for support vector machines. ACM Trans. Intelligent Systems and Technology, 2011, 2(3):27:1-27:27.

[31] Hu X, Tang J L, Liu H. Leveraging knowledge across media for spammer detection in microblogging. In Proc. the 37th SIGIR, July 2014, pp.547-556.

[32] Hu X, Tang J L, Gao H J et al. Social spammer detection with sentiment information. In Proc. the 14th ICDM, December 2014, pp.180-189.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[2] Fei Xianglin; Liao Lei; Wang Hezhen; Wang Chengzao;. Structured Development Environment Based on the Object-Oriented Concepts[J]. , 1992, 7(3): 193 -201 .
[3] Shen Yidong;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[4] Xu Qingyun; Wang Nengbin;. Concurrency Control Mechanism of Complex Objects[J]. , 1992, 7(4): 305 -310 .
[5] Shen Yidong;. A Fixpoint Semantics for Stratified Databases[J]. , 1993, 8(2): 12 -21 .
[6] Wang Hui; Liu Dayou; Wang Yafei;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[7] Kian-Lee Tan;. Optimization of Multi-Join Queries in Shared-Nothing Systems[J]. , 1995, 10(2): 149 -162 .
[8] Ma Guangsheng; Zhang Zhongwei; and Huang Shaobin;. A New Method of Solving Kernels in Algebraic Decomposition for the Synthesis of Logic Cell Array[J]. , 1995, 10(6): 569 -573 .
[9] NI Bin; FENG Yulin;. Dynamic Checking Frameworkfor Java Beaus Semantic Constraints[J]. , 1999, 14(4): 408 -413 .
[10] GAO Suixiang; LIN Guohui;. Decision Tree Complexity of Graph Properties with Dimension at Most5[J]. , 2000, 15(5): 416 -422 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved