|
›› 2015,Vol. 30 ›› Issue (5): 1082-1096.doi: 10.1007/s11390-015-1584-4
所属专题: Artificial Intelligence and Pattern Recognition; Data Management and Data Mining
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Xian Wu1(吴贤), Wei Fan2(范伟), Member, ACM, Jing Gao3(高晶), Member, ACM, IEEE Zi-Ming Feng1(冯子明), Yong Yu1(俞勇)
Xian Wu1(吴贤), Wei Fan2(范伟), Member, ACM, Jing Gao3(高晶), Member, ACM, IEEE Zi-Ming Feng1(冯子明), Yong Yu1(俞勇)
本文研究一种特别的微博用户:"僵尸用户"。僵尸用户是营销公司通过手工创建或者编写程序自动生成的。和普通用户不同, 僵尸用户通过完成特定的任务来获得经济利益。例如, 僵尸用户通过关注某些用户提高他们统计意义上的知名度, 或者通过转发某些微博提高它们统计意义上的影响力。通过人为制造粉丝数量和转发数量, 僵尸用户造成了微博数据失真, 这不仅会误导普通用户, 也会影响基于微博数据的第三方应用。在本文中, 我们研究如何检测僵尸用户。问题的挑战在于营销公司使用了复杂的策略来操作僵尸用户, 使其伪装成正常用户。为了应对这个挑战,, 我们利用两方面的信息来侦测僵尸用户:(1)微博用户的个体特征;(2)用户之间的社交关系。通过使用这两方面的信息, 我们提出了一种半监督的检测模型来区分僵尸用户和正常用户。我们将提出的模型应用到中国最流行的微博平台之一的新浪微博, 我们发现检测的F-Measure可以达到0.9。为了进一步提高检测速度和降低特征生成的代价, 我们进一步提出了一种轻量级的检测模型。这种模型可以使用更少的特征检测转发热门微博的僵尸用户。此外, 我们还将提出的模型应用到新浪微博上被关注最多的200个微博主和被转发最多的50条热门微博上。
[1] Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users:Real-time event detection by social sensors. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.851-860.[2] Yu L L, Asur S, Huberman B A. Artificial inflation:The real story of trends and trend-setters in SinaWeibo. In Proc. the International Conference on Privacy, Security, Risk and Trust and International Conference on Social Computing, September 2012, pp.514-519.[3] Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. arXiv.1010.3003, 2010. http://arxiv.org/abs/1010.3003, June 2015.[4] Yang Z, Cai K, Tang J, Zhang L, Su Z, Li J. Social context summarization. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2011, pp.255-264.[5] Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE:Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.[6] Kang H, Wang K, Soukal D, Behr F, Zheng Z. Large-scale bot detection for search engines. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.501-510.[7] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software:An update. SIGKDD Explorations, 2009, 11(1):10-18.[8] Qiu X, Zhang Q, Huang X. FudanNLP:A toolkit for Chinese natural language processing. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics:System Demonstrations, August 2013, pp.49-54.[9] Mathioudakis M, Koudas N. TwitterMonitor:Trend detection over the Twitter stream. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.1155-1158.[10] Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th International Conference on World Wide Web, March 28-April 1, 2011, pp.247-256.[11] Duan Y, Chen Z, Wei F, Zhou M, Shum H. Twitter topic summarization by ranking tweets using social influence and content quality. In Proc. the 24th International Conference on Computational Linguistics, December 2012, pp.763-780.[12] Lehmann J, Gonçalves B, Ramasco J J, Cattuto C. Dynamical classes of collective attention in Twitter. In Proc. the 21st International Conference on World Wide Web, April 2012, pp.251-260.[13] Dong A, Zhang R, Kolari P, Bai J, Diaz F, Chang Y, Zheng Z, Zha H. Time is of the essence:Improving recency ranking using Twitter data. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.331-340.[14] Buehrer G, Stokes J W, Chellapilla K. A large-scale study of automated web search traffic. In Proc. the 4th International Workshop on Adversarial Information Retrieval on the Web, April 2008, pp.1-8.[15] Yu F, Xie Y, Ke Q. SBotMiner:Large scale search bot detection. In Proc. the 3rd ACM International Conference on Web Search and Data Mining, February 2010, pp.421-430.[16] Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating web spam with TrustRank. In Proc. the 30th International Conference on Very Large Data Bases, August 31-September 3, 2004, pp.576-587.[17] Wu B, Davison B D. Identifying link farm spam pages. In Proc. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, May 2005, pp.820-829.[18] Krishnan V, Raj R. Web spam detection with anti-trust rank. In Proc. the 2nd International Workshop on Adversarial Information Retrieval on the Web, August 2006, pp.37-40.[19] Benczúr A A, Csalogány K, Sarlós T, Uher M. SpamRank-Fully automatic link spam detection. In Proc. the 1st International Workshop on Adversarial Information Retrieval on the Web, May 2005, pp.25-38.[20] Castillo C, Mendoza M, Poblete B. Information credibility on Twitter. In Proc. the 20th International Conference on World Wide Web, Mar. 2011, pp.675-684.[21] Yang C, Harkreader R C, Gu G. Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Transactions on Information Forensics and Security, 2013, 8(8):1280-1293.[22] Laboreiro G, Sarmento L, Oliveira E C. Identifying automatic posting systems in microblogs. In Proc. the 15th Portuguese Conference on Artificial Intelligence, October 2011, pp.634-648.[23] McCord M, Chuah M. Spam detection on Twitter using traditional classifiers. In Proc. the 8th International Conference on Autonomic and Trusted Computing, September 2011, pp.175-186.[24] Benevenuto F, Magno G, Rodrigues T, Almeida V. Detecting spammers on Twitter. In Proc. the 7th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, July 2010.[25] Yang C, Harkreader R, Zhang J, Shin S, Gu G. Analyzing spammers' social networks for fun and profit:A case study of cyber criminal ecosystem on Twitter. In Proc. the 21st International Conference on World Wide Web, April 2012, pp.71-80.[26] Ghosh S, Viswanath B, Kooti F, Sharma N K, Korlam G, Benevenuto F, Ganguly N, Gummadi K P. Understanding and combating link farming in the Twitter social network. In Proc. the 21st International Conference on World Wide Web, April 2012, pp.61-70.[27] Zhu Y, Wang X, Zhong E, Liu N N, Li H, Yang Q. Discovering spammers in social networks. In Proc. the 26th AAAI Conference on Artificial Intelligence, July 2012, pp.171-177.[28] Hu X, Tang J, Zhang Y, Liu H. Social spammer detection in microblogging. In Proc. the 23rd International Joint Conference on Artificial Intelligence, August 2013, pp.2633-2639.[29] Aggarwal A, Kumaraguru P. Followers or phantoms? An anatomy of purchased Twitter followers. arXiv:1408.1534, 2014. http://arxiv.org/abs/1408.1534, June 2015.[30] Shen Y, Yu J, Dong K, Nan K. Automatic fake followers detection in Chinese micro-blogging system. In Proc. the 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 2014, pp.596-607.[31] Liu H, Zhang Y, Lin H, Wu J, Wu Z, Zhang X. How many zombies around you? In Proc. the 13th International Conference on Data Mining, December 2013, pp.1133-1138.[32] Gowri C D, Mohanraj V. A survey on spam detection in Twitter. International Journal of Computer Science and Business Informatics, 2014, 14(1):92-102.[33] Yardi S, Romero D M, Schoenebeck G, Boyd D. Detecting spam in a Twitter network. First Monday, 2010, 15(1).[34] Hentschel M, Alonso O, Counts S, Kandylas V. Finding users we trust:Scaling up verified Twitter users using their communication patterns. In Proc. the 8th International Conference on Weblogs and Social Media, June 2014.[35] Thomas K, Grier C, Song D, Paxson V. Suspended accounts in retrospect:An analysis of Twitter spam. In Proc. the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, November 2011, pp.243-258.[36] Rahman M S, Huang T K, Madhyastha H V, Faloutsos M. Efficient and scalable socware detection in online social networks. In Proc. the 21st USENIX Conference on Security Symposium, August 2012, Article No. 32.[37] Stringhini G, Egele M, Kruegel C, Vigna G. Poultry markets:On the underground economy of Twitter followers. In Proc. the 2012 ACM Workshop on Online Social Networks, August 2012, pp.1-6.[38] Jiang M, Cui P, Beutel A, Faloutsos C, Yang S. Detecting suspicious following behavior in multimillion-node social networks. In Proc. the Companion Publication of the 23rd International Conference on World Wide Web Companion, April 2014, pp.305-306. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |