We use cookies to improve your experience with our site.

面向大规模时态网络的I/O高效早期突发凝聚子图发现算法

I/O Efficient Early Bursting Cohesive Subgraph Discovery in Massive Temporal Networks

  • 摘要: 1、研究背景
    时态网络是一种有效地把时态信息无损编码成图数据的方法。时态网络中的一个重要问题是寻找突发凝聚子图(BCS),该子图能以最快的速率累积其凝聚度,其在真实世界中有着大量的实际应用,如在社交媒体中表示紧急事件等。然而,现有的研究方法都要求(BCS)持续一定的时间段,这就忽略了BCS的时效性,避免了提前发现突发子图。此外,现有研究没有考虑当大规模的时态网络不能完全被加载到内存中的情况。
    2、目的
    众所周知的是,突发事件越早被发现,处理结果也就越好。因此,本文旨在设计一种可以在时态网络中及时表示突发事件的BCS模型,起到一定的预警作用。同时,本文还旨在开发高效的发现算法以满足在大规模时态网络中高效发现所提出模型的需求。
    3、方法
    为了尽快识别突发,本文基于k-core设计了早期突发凝聚子图(EBCS)模型,其被定义为同时满足凝聚性和突发性标准的最大连通子图。为了找到EBCS,本题首先构造了一个时间权重图(TWG),通过整合网络的拓扑信息和时间信息来测量突发程度。然后,1)本文提出了一种全局搜索(GS)算法,GS-EBCS,该算法通过迭代从TWG中移除节点来找到准确的EBCS;2)本文进一步提出了一种局部搜索(LS)算法,LS-EBCS,来查找EBCS,该算法首先从一个种子节点展开,直到获得一个候选k-core,然后在最佳时间复杂度内将该k-core细化成EBCS。此外,为了解决大规模时态网络无法完全被加载到内存中的问题,本题基于半外部模型,首先设计了一种I/O方法来构建TWG,然后开发了使用I/O的高效的GS和LS算法,即I/O-GS和I/O-LS,以找到EBCS。为了验证所提出的模型和算法,本文在4个真实的时态网络上进行了大量实验。
    4、结果
    实验表明,本文所提出的EBCS模型可以有效体现突发事件的及时性,所提出的算法是有效且高效的。例如,在DBLP数据集上,I/O-LS和LS的运行时间基本相当,而I/O-LS的最大内存使用量只有6.5MB,比LS的308.7MB小得多。而且,据我们所知,本文是第一个利用I/O方法解决大规模时态网络中CSM问题的工作。
    5、结论
    总的来说,本文基于对现有图模型和实际环境的综合分析所提出的EBCS模型能够及时体现突发事件,所提出的算法也能在大规模时态网络中高效地发现EBCS。模型及其发现算法适用于社交网络,商业网络等多种实际情况。此外,本文也在计划在未来使用Hadoop和Spark等分布式框架中实现EBCS的发现。

     

    Abstract: Temporal networks are an effective way to encode temporal information into graph data losslessly. Finding the bursting cohesive subgraph (BCS), which accumulates its cohesiveness at the fastest rate, is an important problem in temporal networks. The BCS has a large number of applications, such as representing emergency events in social media, traffic congestion in road networks and epidemic outbreak in communities. Nevertheless, existing methods demand the BCS lasting for a time interval, which neglects the timeliness of the BCS. In this paper, we design an early bursting cohesive subgraph (EBCS) model based on the k-core to enable identifying the burstiness as soon as possible. To find the EBCS, we first construct a time weight graph (TWG) to measure the bursting level by integrating the topological and temporal information. Then, we propose a global search algorithm, called GS-EBCS, which can find the exact EBCS by iteratively removing nodes from the TWG. Further, we propose a local search algorithm, named LS-EBCS, to find the EBCS by first expanding from a seed node until obtaining a candidate k-core and then refining the k-core to the result subgraph in an optimal time complexity. Subsequently, considering the situation that the massive temporal networks cannot be completely put into the memory, we first design an I/O method to build the TWG and then develop I/O efficient global search and local search algorithms, namely I/O-GS and I/O-LS respectively, to find the EBCS under the semi-external model. Extensive experiments, conducted on four real temporal networks, demonstrate the efficiency and effectiveness of our proposed algorithms. For example, on the DBLP dataset, I/O-LS and LS-EBCS have comparable running time, while the maximum memory usage of I/O-LS is only 6.5 MB, which is much smaller than that of LS-EBCS taking 308.7 MB.

     

/

返回文章
返回