用于网页聚合的基于内容的发布/订阅系统

doi:10.1007/s11390-016-1632-8

用于网页聚合的基于内容的发布/订阅系统

Content-Based Publish/Subscribe System for Web Syndication

摘要

摘要: 内容聚合已成为网络上一种及时发送频繁更新信息的流行方式。如今,各种各样的应用使网页聚合技术,如,RSS或Atom,在科学和专业的社区,从大型新闻广播到中型信息分享均已普及。然而,在Web 2.0中,这些应用表明在处理信息负荷时存在一些严重的局限性,因而在订阅feed中,急切需要有效实时的过滤方法便于用户有效地订阅自身感兴趣的信息。本文分析了用户订阅的三种检索技术,它们使用倒排表或排序的字典树(trie)进行准确和部分匹配。我们提出了存储需求和匹配时间的分析模型,并且为实际网络联合工作负荷重关键参数的影响做了全面的实验评估。

Abstract: Content syndication has become a popular way for timely delivery of frequently updated information on the Web. Today, web syndication technologies such as RSS or Atom are used in a wide variety of applications spreading from large-scale news broadcasting to medium-scale information sharing in scientific and professional communities. However, they exhibit serious limitations for dealing with information overload in Web 2.0. There is a vital need for efficient realtime filtering methods across feeds, to allow users to effectively follow personally interesting information. We investigate in this paper three indexing techniques for users' subscriptions based on inverted lists or on an ordered trie for exact and partial matching. We present analytical models for memory requirements and matching time and we conduct a thorough experimental evaluation to exhibit the impact of critical parameters of realistic web syndication workloads.

HTML全文

参考文献()

施引文献

资源附件()