社会标签在网页推荐中的应用

李慧倩; 夏粉; 曾大军; 王飞跃; 毛文吉

摘要: 实用价值和应用前景
社会标签系统作为Web2.0技术的典型应用，越来越受到来自用户、企业和研究者的重视。用户在社会标签系统中，用户可以用一个字或词来描述网页、图片或者视频，因此标签为网上的内容提供了更多的信息。另外，标签信息也能反应出用户的爱好。因此，无论是互联网行业还是研究者们都认为，标签在辅助网络资源共享、信息检索、知识发掘等应用领域具有潜在的重要价值。

创新点
近几年，标签数据的研究者们发现，网上很大比例的标签都是噪音、错误拼写和没有语义的符号等，而且不同的标签实际上是语义上相似或相同的词，从而降低了标签的可用性。基于以上问题，研究者们提出了标签数据应用中的技术难点：如何利用具有语义相似性的标签来帮助信息检索。

在本文中，我们基于已有的工作系统性地探讨了如何利用标签数据的语义特性和用户浏览行为改进推荐算法。我们从三个方面进行了探索：首先，我们讨论了如何对标签、用户和网页进行聚类，以建立其相似性和语义关系；其次，我们证明标签是否能够用来辅助信息检索，帮助用户定位到其感兴趣的产品；最后，我们讨论了用户在浏览网页寻找信息以及为资源打标签时的行为，及这些行为是如何在模型中被用来辅助推荐的。

实现方法
基于以上目标，我们提出了四个图模型。在模型中，用隐变量的方式为标签、用户和网页建立基于语义的聚类。模型可设定模型中变量，如用户、标签和网页，与类别之间的结构关系，从而模拟用户浏览网页和打标签的不同行为方式。我们基于以上模型，通过EM算法，估计了用户对其没有访问过的网页和标签的潜在兴趣，并将潜在兴趣最大的网页推荐给用户。通过对推荐准确度的分析，我们的模型可以对标签在推荐中的作用以及用户的真实浏览模式进行评估，从而合理精确地利用标签信息辅助用户信息获取。

结论及未来待解决的问题
通过在真实的数据上进行推荐，并与传统的协同过滤推荐方法进行比较，我们证明了标签系统在产品推荐中的有效性以及聚类的必要性，并通过模型间的比较论证了有效的用户浏览行为模型。

我们下阶段的工作，将进一步拓展标签的语义相似特性，并通过用户浏览的图模型进一步将语义相似的标签用于检索扩展。并且，我们将继续深入理解相似用户的组织方式和信息共享模式，拟采用基于图上传递的方法利用相似用户进一步提高推荐效率。

Abstract: Collaborative social annotation systems allow users to record and share their original keywords or tag attachments to Web resources such as Web pages, photos, or videos. These annotations are a method for organizing and labeling information. They have the potential to help users navigate the Web and locate the needed resources. However, since annotations are posted by users under no central control, there exist problems such as spam and synonymous annotations. To efficiently use annotation information to facilitate knowledge discovery from the Web, it is advantageous if we organize social annotations from semantic perspective and embed them into algorithms for knowledge discovery. This inspires the Web page recommendation with annotations, in which users and Web pages are clustered so that semantically similar items can be related. In this paper we propose four graphic models which cluster users, Web pages and annotations and recommend Web pages for given users by assigning items to the right cluster first. The algorithms are then compared to the classical collaborative filtering recommendation method on a real-world data set. Our result indicates that the graphic models provide better recommendation performance and are robust to fit for the real applications.

社会标签在网页推荐中的应用

Exploring Social Annotations with the Application to Web Page Recommendation