›› 2015,Vol. 30 ›› Issue (5): 1097-1108.doi: 10.1007/s11390-015-1585-3

所属专题: Artificial Intelligence and Pattern Recognition Data Management and Data Mining

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

基于联合聚类的微博异常检测

Wu Yang(杨武), Senior Member, CCF, Member, ACM Guo-Wei Shen*(申国伟), Student Member, CCF, Member, ACM, Wei Wang(王巍), Liang-Yi Gong(宫良一) Miao Yu(于淼), Guo-Zhong Dong(董国忠)   

  1. Information Security Research Center, Harbin Engineering University, Harbin 150001, China
  • 收稿日期:2014-11-17 修回日期:2015-07-12 出版日期:2015-09-05 发布日期:2015-09-05
  • 通讯作者: Guo-Wei Shen E-mail:shenguowei@hrbeu.edu.cn
  • 作者简介:Wu Yang received his Ph.D. degree in computer system architecture from Harbin Institute of Technology, Harbin, in 2005. He is currently a professor and doctoral supervisor of Harbin Engineering University. His main research interests include data mining, information security and wireless sensor network. He is a senior member of CCF and a member of ACM.
  • 基金资助:

    This work was supported by the National Natural Science Foundation of China under Grant No. 61170242, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA012802, and the Fundamental Research Funds for the Central Universities of China under Grant No. HEUCF100605.

Anomaly Detection in Microblogging via Co-Clustering

Wu Yang(杨武), Senior Member, CCF, Member, ACM Guo-Wei Shen*(申国伟), Student Member, CCF, Member, ACM, Wei Wang(王巍), Liang-Yi Gong(宫良一) Miao Yu(于淼), Guo-Zhong Dong(董国忠)   

  1. Information Security Research Center, Harbin Engineering University, Harbin 150001, China
  • Received:2014-11-17 Revised:2015-07-12 Online:2015-09-05 Published:2015-09-05
  • Contact: Guo-Wei Shen E-mail:shenguowei@hrbeu.edu.cn
  • About author:Wu Yang received his Ph.D. degree in computer system architecture from Harbin Institute of Technology, Harbin, in 2005. He is currently a professor and doctoral supervisor of Harbin Engineering University. His main research interests include data mining, information security and wireless sensor network. He is a senior member of CCF and a member of ACM.
  • Supported by:

    This work was supported by the National Natural Science Foundation of China under Grant No. 61170242, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA012802, and the Fundamental Research Funds for the Central Universities of China under Grant No. HEUCF100605.

传统的微博异常检测算法将用户和消息分开检测, 而随着异常用户的智能性越来越高, 检测效果显著下降。本文提出了一个新的基于二部图的联合聚类框架同时检测异常用户和消息。在该框架中, 将用户和消息之间的异质交互、同质交互采用二部图建模, 并通过非负矩阵三分解实现异常用户和消息同时检测。同质交互关系作为约束条件融合到联合聚类算法中, 进而提高联合聚类算法的准确率。在新浪微博数据集上的实验表明, 本文提出的算法在检测异常用户和消息时具有较高的准确率, 并且能够处理个体异常和群体异常。

Abstract: Traditional anomaly detection on microblogging mostly focuses on individual anomalous users or messages. Since anomalous users employ advanced intelligent means, the anomaly detection is greatly poor in performance. In this paper, we propose an innovative framework of anomaly detection based on bipartite graph and co-clustering. A bipartite graph between users and messages is built to model the homogeneous and heterogeneous interactions. The proposed coclustering algorithm based on nonnegative matrix tri-factorization can detect anomalous users and messages simultaneously. The homogeneous relations modeled by the bipartite graph are used as constraints to improve the accuracy of the coclustering algorithm. Experimental results show that the proposed scheme can detect individual and group anomalies with high accuracy on a Sina Weibo dataset.

[1] Takahashi T, Tomioka R, Yamanishi K. Discovering emerging topics in social streams via link-anomaly detection. IEEE Trans. Knowledge and Data Engineering, 2014, 26(1):120-130.

[2] Guille A, Favre C. Mention-anomaly-based event detection and tracking in Twitter. In Proc. the IEEE International Conference on Advances in Social Network Analysis and Mining, August 2014, pp.375-382

[3] Savage D, Zhang X, Yu X et al. Anomaly detection in online social networks. Social Networks, 2014, 39:62-70.

[4] O'Callaghan D, Harrigan M, Carthy J et al. Network analysis of recurring YouTube spam campaigns. In Proc. the 6th AAAI Conference on Weblogs and Social Media, June 2012, pp.531-534.

[5] Gao H, Hu J, Huang T et al. Security issues in online social networks. IEEE Internet Computing, 2011, 15(4):56-63.

[6] Zhu Y, Wang X, Zhong E et al. Discovering spammers in social networks. In Proc. the 26th AAAI Conference on Artificial Intelligence, July 2012, pp.171-177.

[7] Kwak H, Lee C, Park H et al. What is Twitter, a social network or a news media? In Proc. the 19th WWW, April 2010, pp.591-600.

[8] Wu S, Hofman J M, Mason W A et al. Who says what to whom on Twitter. In Proc. the 20th WWW, Match 28-April 1, 2011, pp.705-714.

[9] Yu L, Asur S, Huberman B A. What trends in Chinese social media. In Proc. the 5th SNA-KDD Workshop, August 2011.

[10] Gao Q, Abel F, Houben G et al. A comparative study of users' microblogging behavior on Sina Weibo and Twitter. In Lecture Notes in Computer Science 7379, Masthoff J, Mobasher B, Desmarais M C et al. (eds.), Springer Berlin Heidelberg, 2012, pp.88-101.

[11] McCord M, Chuah M. Spam detection on Twitter using traditional classifiers. In Lecture Notes in Computer Science 6906, Alcaraz Calero J M, Yang L T, Mármol F G et al. (eds.), Springer Berlin Heidelberg, 2011, pp.175-186.

[12] Martinez-Romo J, Araujo L. Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications:An International Journal, 2013, 40(8):2992-3000.

[13] Bosma M, Meij E, Weerkamp W. A framework for unsupervised spam detection in social networking sites. In Lecture Notes in Computer Science 7224, Baeza-Yates R, de Vries A P, Zaragoza H et al. (eds.), Springer Berlin Heidelberg, 2012, pp.364-375.

[14] Altshuler Y, Fire M, Shmueli E et al. Detecting anomalous behaviors using structural properties of social networks. In Proc. the 6th International Conference on Social Computing, Behavioral Cultural Modeling and Prediction, April 2013, pp.433-440.

[15] Zhang Q, Ma H, QianW et al. Duplicate detection for identifying social spam in microblogs. In Proc. the 2nd IEEE International Congress on Big Data, June 27-July 2, 2013, pp.141-148.

[16] Chu Z, Widjaja I, Wang H. Detecting social spam campaigns on Twitter. In Lecture Notes in Computer Science 7341, Bao F, Samarati P, Zhou J (eds.), Springer Berlin Heidelberg, 2012, pp.455-472.

[17] Jiang J, Wilson C, Wang X et al. Understanding latent interactions in online social networks. In Proc. the 10th ACM SIGCOMM Conference on Internet Measurement, November 2010, pp.369-382.

[18] Chen Y, Wang L, Dong M. Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans. Knowledge and Data Engineering, 2010, 22(10):1459-1474.

[19] Tang L,Wang X F, Liu H. Community detection via heterogeneous interaction analysis. Data Mining and Knowledge Discovery, 2012, 25(1):1-33.

[20] Hu X, Tang J L, Zhang Y C et al. Social spammer detection in microblogging. In Proc. the 23rd International Joint Conference on Artificial Intelligence, August 2013, pp.2633-2639.

[21] Hu X, Tang J L, Liu H. Online social spammer detection. In Proc. the 28th AAAI Conference on Artificial Intelligence, July 2014, pp.59-65.

[22] Dai H, Zhu F, Lim E et al. Detecting anomalies in bipartite graphs with mutual dependency principles. In Proc. the 12th ICDM, December 2012, pp.171-180.

[23] Sun J, Qu H, Chakrabarti D et al. Neighborhood formation and anomaly detection in bipartite graphs. In Proc. the 5th ICDM, Nov. 2005, pp.418-425.

[24] Akoglu L, Tong H, Koutra D. Graph based anomaly detection and description:A survey. Data Mining and Knowledge Discovery, 2014, 29(3):626-688.

[25] Zhao B, Ji G, QuW et al. Detecting spam community using retweeting relationships-A study on Sina microblog. In Lecture Notes in Computer Science 8178, Cao L, Motoda H, Srivastava J et al. (eds.), Springer International Publishing, 2013, pp.178-190.

[26] Bhat S Y, Abulaish M. Community-based features for identifying spammers in online social networks. In Proc. the 2013 IEEE International Conference on Advances in Social Networks Analysis and Mining, August 2013, pp.100-107.

[27] Yu R, He X R, Liu Y. GLAD:Group anomaly detection in social media analysis. In Proc. the 20th ACM SIGKDD KDD, August 2014, pp.372-381.

[28] Xing E P, Ng A Y, Jordan M I et al. Distance metric learning, with application to clustering with side-information. In Proc. the 16th Neural Information Processing Systems, December 2002, pp.505-512.

[29] Wang H, Nie F P, Huang H. Robust distance metric learning via simultaneous l1-norm minimization and maximization. In Proc. the 31st International Conference on Machine Learning, June 2014, pp.1836-1844.

[30] Chang C C, Lin C J. LIBSVM:A library for support vector machines. ACM Trans. Intelligent Systems and Technology, 2011, 2(3):27:1-27:27.

[31] Hu X, Tang J L, Liu H. Leveraging knowledge across media for spammer detection in microblogging. In Proc. the 37th SIGIR, July 2014, pp.547-556.

[32] Hu X, Tang J L, Gao H J et al. Social spammer detection with sentiment information. In Proc. the 14th ICDM, December 2014, pp.180-189.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[2] 费翔林; 廖雷; 王和珍; 汪承藻;. Structured Development Environment Based on the Object-Oriented Concepts[J]. , 1992, 7(3): 193 -201 .
[3] 沈一栋;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[4] 徐庆云; 王能斌;. Concurrency Control Mechanism of Complex Objects[J]. , 1992, 7(4): 305 -310 .
[5] 沈一栋;. A Fixpoint Semantics for Stratified Databases[J]. , 1993, 8(2): 12 -21 .
[6] 王晖; 刘大有; 王亚飞;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[7] Kian-Lee Tan;. Optimization of Multi-Join Queries in Shared-Nothing Systems[J]. , 1995, 10(2): 149 -162 .
[8] 马光胜; 张忠伟; 黄少滨;. A New Method of Solving Kernels in Algebraic Decomposition for the Synthesis of Logic Cell Array[J]. , 1995, 10(6): 569 -573 .
[9] 倪彬; 冯玉琳;. Dynamic Checking Frameworkfor Java Beaus Semantic Constraints[J]. , 1999, 14(4): 408 -413 .
[10] 高随祥; 林国辉;. Decision Tree Complexity of Graph Properties with Dimension at Most5[J]. , 2000, 15(5): 416 -422 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: