基于在线群体用户注意力流定量研究Web站点影响力
Quantifying the Influence of Websites Based on Online Collective Attention Flow
-
摘要: 在线用户的上网记录、通讯记录、电子商务交易记录等网络大数据的可用, 使得研究人员可以定量探究用户在Web站点间长期、复杂的交互模式。如果把Web看作是一个虚拟生命组织, 根据新陈代谢理论, Web必须吸收“能量”来生长、繁衍和发展, 我们感兴趣的是: (1)这种“能量”来自何方?(2)基于这种“能量”, 站点能否在整个Web上产生宏观影响力?本文从中国互联网络信息中心(CNNIC)获得30000多名志愿者用户的在线行为数据, 将站点的影响力看作新陈代谢, 将在线用户的群体注意力流看作站点的能量, 基于实证数据建立的注意力流网络研究了群体用户的注意力在不同站点间的分布与流动。与传统的研究关注信息流不同, 本文研究群体用户的注意流, 这不仅是从一个“相反”的方向研究Web结构和注意力的转移模式, 也是对万维网的发展动力学的初步探索。本文发现, 站点的影响力与群体用户注意力在该站点上的停留时间呈亚线性关系, 这与用户的注意力在一个站点上的停留时间越多则该站点的影响力越大这种直觉不相符。进一步发现, 站点的影响力与流经该站点的注意力流的强度呈超线性关系, 这就是Web站点的Kleiber律。本文还发现, Web站点的发展可分为三个阶段, 即: 不确定生长相、部分加速生长相和全面加速生长相。最后, 将提出的注意力流模型与广泛应用的超链接分析模型作了对比, 发现注意力流网络是一个有效的站点评估和分级工具。Abstract: The availability of network big data, such as those from online users' surfing records, communication records, and e-commerce records, makes it possible for us to probe into and quantify the regular patterns of users' long-range and complex interactions between websites. If we see the Web as a virtual living organism, according to the metabolic theory, the websites must absorb “energy” to grow, reproduce, and develop. We are interested in the following two questions: 1) where does the “energy” come from? 2) will the websites generate macro influence on the whole Web based on the “energy”? Our data consist of more than 30 000 online users' surfing log data from China Internet Network Information Center. We would consider the influence as metabolism and users' attention flow as the energy for the websites. We study how collective attention distributes and flows among different websites by the empirical attention flow network. Different from traditional studies which focused on information flow, we study users' attention flow, which is not only a “reversed” way to study Web structure and transmission mode, but also the first step to understand the underlying dynamics of the World Wide Web. We find that the macro influence of websites scales sub-linearly against the collective attention flow dwelling time, which is not consistent with the heuristics that the more users' dwelling time is, the greater influence a website will have. Further analysis finds a supper-linear scaling relationship between the influence of websites and the attention flow intensity. This is a websites version of Kleiber's law. We further notice that the development cycle of the websites can be split into three phases: the uncertain growth phase, the partially accelerating growth phase, and the fully accelerating growth phase. We also find that compared with the widespread hyperlinks analysis models, the attention flow network is an effective theoretical tool to estimate and rank websites.