We use cookies to improve your experience with our site.
陈诚, 吴逵, . 最佳答案?请三思而后行:在社区问答平台上检测商业推广活动[J]. 计算机科学技术学报, 2015, 30(4): 810-828. DOI: 10.1007/s11390-015-1562-x
引用本文: 陈诚, 吴逵, . 最佳答案?请三思而后行:在社区问答平台上检测商业推广活动[J]. 计算机科学技术学报, 2015, 30(4): 810-828. DOI: 10.1007/s11390-015-1562-x
Cheng Chen, Kui Wu, Venkatesh Srinivasan, Kesav Bharadwaj R. The Best Answers? Think Twice: Identifying Commercial Campagins in the CQA Forums[J]. Journal of Computer Science and Technology, 2015, 30(4): 810-828. DOI: 10.1007/s11390-015-1562-x
Citation: Cheng Chen, Kui Wu, Venkatesh Srinivasan, Kesav Bharadwaj R. The Best Answers? Think Twice: Identifying Commercial Campagins in the CQA Forums[J]. Journal of Computer Science and Technology, 2015, 30(4): 810-828. DOI: 10.1007/s11390-015-1562-x

最佳答案?请三思而后行:在社区问答平台上检测商业推广活动

The Best Answers? Think Twice: Identifying Commercial Campagins in the CQA Forums

  • 摘要: 目前,有越来越多的互联网用户搜索来自社区问答平台网站的信息。在这些网站上与其他人的互动沟通,可以让用户得到一种难得的信任感觉。在多数情况下,最终用户为了寻求即时的帮助,当他们浏览社区问答平台网站时总是会先查找最佳答案。因此,当务之急是,我们应该让用户意识到隐藏在答案背后的,任何潜在的商业推广活动。现有的研究更侧重于答案的质量,而不符合上述的要求。问题与答案之间的文本相似度已经被广泛的运用在这些现有的研究工作中。但是,当面对商业推广水军时,此特征将不再有效。为了有效的接测潜在的商业推广回答,我们需要在新的检测系统中考虑更多的上下文信息,例如推广答案模板和用户的信誉度。在本文中,我们开发了一个检测系统,该系统会自动分析隐藏的商业推广的模式,只要检测到潜在的商业活动,它就立即提示给最终用户。我们的检测方法结合了语义分析和网络写手的跟踪记录,并利用了社区问答平台特有的一些特征。这些特征有别于其他类型论坛网站(如微博和新闻报道网站)的特征。我们的系统是自适应的,可以实时的用刚刚检测到的新的商业推广证据来更新已有的检测模型。为了验证我们的系统,我们从一个流行的中国社区问答网站上获取了三个月的跟踪数据。从验证结果可以看出,我们的系统展示了自适应检测社区问答平台上商业推广的巨大潜力。

     

    Abstract: In an emerging trend, more and more Internet users search for information from Community Question and Answer (CQA) websites, as interactive communication in such websites provides users with a rare feeling of trust. More often than not, end users look for instant help when they browse the CQA websites for the best answers. Hence, it is imperative that they should be warned of any potential commercial campaigns hidden behind the answers. Existing research focuses more on the quality of answers and does not meet the above need. Textual similarities between questions and answers are widely used in previous research. However, this feature will no longer be effective when facing commercial paid posters. More context information, such as writing templates and a user's reputation track need to be combined together to form a new model to detect the potential campaign answers. In this paper, we develop a system that automatically analyzes the hidden patterns of commercial spam and raises alarms instantaneously to end users whenever a potential commercial campaign is detected. Our detection method integrates semantic analysis and posters' track records and utilizes the special features of CQA websites largely different from those in other types of forums such as microblogs or news reports. Our system is adaptive and accommodates new evidence uncovered by the detection algorithms over time. Validated with real-world trace data from a popular Chinese CQA website over a period of three months, our system shows great potential towards adaptive detection of CQA spams.

     

/

返回文章
返回