We use cookies to improve your experience with our site.

在线社交网络中一种经济有效的用户数据存储方法

A Cost-Efficient Approach to Storing Users' Data for Online Social Networks

  • 摘要: 随着越来越多的用户通过社交媒体进行在线交流,在线社交网络正在快速发展壮大。面对用户产生的大数据,数据存储必须是分布式的、可扩展的并且经济有效。解决这一问题最重要的挑战是如何在不降低系统性能的前提下最小化成本代价。虽然目前许多存储系统采用分布式健值存储技术,但该技术并不适合直接应用到社交网络存储系统。这是因为社交用户的数据高度关联,哈希存储容易导致服务器间频繁通信,而高额的通信代价极大地降低了社交网络存储系统的可扩展性。以往研究提出对社交图采用网络划分,和对数据进行复制两种方法。然而,数据复制会增加存储成本,并影响通信成本。在本文中,我们关注于如何从数据存储的角度结合划分和复制以最小化成本。我们提出的经济有效的数据存储方法能够很好地支持社交网络存储系统的扩展。该方法利用网络划分和数据复制同步优化将经常发生交互的用户的数据放在一起,放置过程始终满足系统负载均衡的约束。我们在Facebook数据集上进行了大量实验以验证本文所提算法。

     

    Abstract: As users increasingly befriend others and interact online via their social media accounts, online social networks (OSNs) are expanding rapidly. Confronted with the big data generated by users, it is imperative that data storage be distributed, scalable, and cost-efficient. Yet one of the most significant challenges about this topic is determining how to minimize the cost without deteriorating system performance. Although many storage systems use the distributed key value store, it cannot be directly applied to OSN storage systems. And because users' data are highly correlated, hash storage leads to frequent inter-server communications, and the high inter-server traffic costs decrease the OSN storage system's scalability. Previous studies proposed conducting network partitioning and data replication based on social graphs. However, data replication increases storage costs and impacts traffic costs. Here, we consider how to minimize costs from the perspective of data storage, by combining partitioning and replication. Our cost-efficient data storage approach supports scalable OSN storage systems. The proposed approach co-locates frequently interactive users together by conducting partitioning and replication simultaneously while meeting load-balancing constraints. Extensive experiments are undertaken on two realworld traces, and the results show that our approach achieves lower cost compared with state-of-the-art approaches. Thus we conclude that our approach enables economic and scalable OSN data storage.

     

/

返回文章
返回