计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (1): 234-252.doi: 10.1007/s11390-019-1907-y

所属专题: Artificial Intelligence and Pattern Recognition Data Management and Data Mining

• • 上一篇    

在线社交网络中一种经济有效的用户数据存储方法

Jing-Ya Zhou1,2,3, Member, CCF, ACM, Jian-Xi Fan1, Member, CCF, Cheng-Kuan Lin1, Member, CCF, and Bao-Lei Cheng1, Member, CCF   

  1. 1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China;
    2 Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, China;
    3 Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University Nanjing 210023, China
  • 收稿日期:2017-10-30 修回日期:2018-10-11 出版日期:2019-01-05 发布日期:2019-01-12
  • 作者简介:Jing-Ya Zhou received his B.S. degree in computer science from Anhui Normal University, Wuhu, in 2005, and his Ph.D. degree in computer science from Southeast University, Nanjing, in 2013. He is currently an assistant professor with the School of Computer Science and Technology, Soochow University, Suzhou. His research interests include cloud and edge computing, parallel and distributed systems, online social networks and data center networking.
  • 基金资助:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61502328, 61672370, and 61572337, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant No. 1701173B, the Open Project Program of Jiangsu Provincial Key Laboratory for Computer Information Processing Technology under Grant No. KJS1740, and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 18KJA520009.

A Cost-Efficient Approach to Storing Users' Data for Online Social Networks

Jing-Ya Zhou1,2,3, Member, CCF, ACM, Jian-Xi Fan1, Member, CCF, Cheng-Kuan Lin1, Member, CCF, and Bao-Lei Cheng1, Member, CCF   

  1. 1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China;
    2 Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, China;
    3 Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University Nanjing 210023, China
  • Received:2017-10-30 Revised:2018-10-11 Online:2019-01-05 Published:2019-01-12
  • About author:Jing-Ya Zhou received his B.S. degree in computer science from Anhui Normal University, Wuhu, in 2005, and his Ph.D. degree in computer science from Southeast University, Nanjing, in 2013. He is currently an assistant professor with the School of Computer Science and Technology, Soochow University, Suzhou. His research interests include cloud and edge computing, parallel and distributed systems, online social networks and data center networking.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61502328, 61672370, and 61572337, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant No. 1701173B, the Open Project Program of Jiangsu Provincial Key Laboratory for Computer Information Processing Technology under Grant No. KJS1740, and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 18KJA520009.

随着越来越多的用户通过社交媒体进行在线交流,在线社交网络正在快速发展壮大。面对用户产生的大数据,数据存储必须是分布式的、可扩展的并且经济有效。解决这一问题最重要的挑战是如何在不降低系统性能的前提下最小化成本代价。虽然目前许多存储系统采用分布式健值存储技术,但该技术并不适合直接应用到社交网络存储系统。这是因为社交用户的数据高度关联,哈希存储容易导致服务器间频繁通信,而高额的通信代价极大地降低了社交网络存储系统的可扩展性。以往研究提出对社交图采用网络划分,和对数据进行复制两种方法。然而,数据复制会增加存储成本,并影响通信成本。在本文中,我们关注于如何从数据存储的角度结合划分和复制以最小化成本。我们提出的经济有效的数据存储方法能够很好地支持社交网络存储系统的扩展。该方法利用网络划分和数据复制同步优化将经常发生交互的用户的数据放在一起,放置过程始终满足系统负载均衡的约束。我们在Facebook数据集上进行了大量实验以验证本文所提算法。

关键词: 在线社交网络, 服务器间通信代价, 存储代价, 网络划分, 数据复制

Abstract: As users increasingly befriend others and interact online via their social media accounts, online social networks (OSNs) are expanding rapidly. Confronted with the big data generated by users, it is imperative that data storage be distributed, scalable, and cost-efficient. Yet one of the most significant challenges about this topic is determining how to minimize the cost without deteriorating system performance. Although many storage systems use the distributed key value store, it cannot be directly applied to OSN storage systems. And because users' data are highly correlated, hash storage leads to frequent inter-server communications, and the high inter-server traffic costs decrease the OSN storage system's scalability. Previous studies proposed conducting network partitioning and data replication based on social graphs. However, data replication increases storage costs and impacts traffic costs. Here, we consider how to minimize costs from the perspective of data storage, by combining partitioning and replication. Our cost-efficient data storage approach supports scalable OSN storage systems. The proposed approach co-locates frequently interactive users together by conducting partitioning and replication simultaneously while meeting load-balancing constraints. Extensive experiments are undertaken on two realworld traces, and the results show that our approach achieves lower cost compared with state-of-the-art approaches. Thus we conclude that our approach enables economic and scalable OSN data storage.

Key words: online social network, inter-server traffic cost, storage cost, network partitioning, data replication

[1] Althoff T, Jindal P, Leskovec J. Online actions with offline impact:How online social networks influence online and offline user behavior. In Proc. the 10th ACM International Conference on Web Search and Data Mining, February 2017, pp.537-546.
[2] Peng S C, Yang A M, Cao L H, Yu S, Xie D Q. Social influence modeling using information theory in mobile social networks. Information Sciences, 2017, 379:146-159.
[3] Wang F, Wang H Y, Xu K, Wu J H, Jia X H. Characterizing information diffusion in online social networks with linear diffusive model. In Proc. the 33rd International Conference on Distributed Computing Systems, July 2013, pp.307-316.
[4] Al-Fares M, Loukissas A, Vahdat, A. A scalable, commodity data center network architecture. In Proc. the ACM SIGCOMM Conference on Data communication, August 2008, pp.63-74.
[5] Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th IEEE Symposium on Mass Storage Systems and Technologies, May 2010.
[6] Lakshman A, Malik P. Cassandra:A decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2):35-40.
[7] Sumbaly R, Kreps J, Gao L, Feinberg A, Soman C, Shah S. Serving large-scale batch computed data with project Voldemort. In Proc. the 10th USENIX Conference on File and Storage Technologies, February 2012, pp.223-235.
[8] Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 1998, 20(1):359-392.
[9] Chen H H, Jin H, Wu S. Minimizing inter-server communications by exploiting self-similarity in online social networks. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(4):1116-1130.
[10] Liu G X, Shen H Y, Chandler H. Selective data replication for online social networks with distributed datacenters. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(8):2377-2393.
[11] Pujol J M, Erramilli V, Siganos G, Yang X, Laoutaris N, Chhabra P, Rodriguez P. The little engine(s) that could:Scaling online social networks. IEEE/ACM Transactions on Networking, 2012, 20(4):1162-1175.
[12] Tran D A, Zhang T. S-PUT:An EA-based framework for socially aware data partitioning. Computer Networks, 2014, 75:504-518.
[13] Yu B Y, Pan J P. Location-aware associated data placement for geo-distributed data intensive applications. In Proc. the 34th IEEE International Conference on Computer Communications, April 2015, pp.603-611.
[14] Zhou J Y, Fan J X, Cheng B L, Jia J C. Optimizing interserver communications by exploiting overlapping communities in online social networks. In Proc. the 16th International Conference on Algorithms and Architectures for Parallel Processing, December 2016, pp.231-244.
[15] Tran D A, Nguyen K, Pham C. S-CLONE:Socially-aware data replication for social networks. Computer Networks, 2012, 56(7):2001-2013
[16] Zhang J H, Chen J, Luo J Z, Song A B. Efficient locationaware data placement for data-intensive applications in geodistributed scientific data centers. Tsinghua Science and Technology, 2016, 21(5):471-481.
[17] Jiao L, Lit J, Du W, Fu X M. Multi-objective data placement for multi-cloud socially aware services. In Proc. the 33rd IEEE International Conference on Computer Communications, April 2014, pp.28-36.
[18] Gregory S. Finding overlapping communities in networks by label propagation. New Journal of Physics, 2010, 12(10):Article No. 103018.
[19] Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 2009, 11(3):Article No. 033015.
[20] Qiao S, Han N, Zhang K, Zou L, Wang H, Alberto G L. Algorithm for detecting overlapping communities from complex network big data. Journal of Software, 2017, 28(3):631-647. (in Chinese)
[21] Wilson C, Sala A, Puttaswamy K P N, Zhao B Y. Beyond social graphs:User interactions in online social networks and their implications. ACM Transactions on the Web, 2012, 6(4):Article No. 17.
[22] Gjoka M, Kurant M, Butts C T, Markopoulou A. Walking in Facebook:A case study of unbiased sampling of OSNs. In Proc. the 29th IEEE International Conference on Computer Communications, March 2010, pp.2498-2506.
[23] Jiang J, Wilson C, Wang X, Sha W P, Huang P, Dai Y F, Zhao B Y. Understanding latent interactions in online social networks. ACM Transactions on the Web, 2013, 7(4):Article No. 18.
[24] Benevenuto F, Rodrigues T, Cha M, Almeida V A F. Characterizing user behavior in online social networks. In Proc. the 9th ACM SIGCOMM Conference on Internet Measurement, November 2009, pp.49-62.
[25] Roy A, Zeng H, Bagga J, Porter G, Snoeren A C. Inside the social network's (datacenter) network. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, August 2015, pp.123-137.
[1] Jing Jiang, Zi-Fei Shan, Xiao Wang, Li Zhang, Ya-Fei Dai. 真实环境下虚假团体的测量研究[J]. , 2015, 30(6): 1344-1357.
[2] Da-Wei Sun (孙大为), Student Member, CCF, ACM, Gui-Ran Chang (常桂然), Shang Gao (高尚), Li-Zhong Jin (靳立忠), and Xing-Wei Wang, (王兴伟), Senior Member, CCF, ACM. 云计算环境中一种实现高可用性的动态数据复制策略[J]. , 2012, (2): 256-272.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[2] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[3] 王能斌; 刘小青; 刘光富;. A Software Tool for Constructing Traditional Chinese Medical Expert Systems[J]. , 1988, 3(3): 214 -220 .
[4] 蔡士杰; 张福炎;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[5] 徐美瑞; 刘小林;. A VLSI Algorithm for Calculating the Tree to Tree Distance[J]. , 1993, 8(1): 68 -76 .
[6] 姜文彬;. A Method for Minimization Design of Two-Level Logic Networks Using Multiplexer Universal Logic Modules[J]. , 1994, 9(1): 92 -96 .
[7] 王晖; 刘大有; 王亚飞;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[8] 黄国勇; 李三立;. TSP: A Heterogeneous Multiprocessor Supercomputing System Based on i860XP[J]. , 1994, 9(3): 285 -288 .
[9] 彭澄廉;. Combining Gprof and Event-Driven Monitoring for Analyzing Distributed Programs:A Rough View of NCSA Mosaic[J]. , 1996, 11(4): 427 -432 .
[10] 张永越; 彭振云; 游素亚; 徐光佑;. A Multi-View Face Recognition System[J]. , 1997, 12(5): 400 -407 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: