›› 2012,Vol. ›› Issue (2): 256-272.doi: 10.1007/s11390-012-1221-4

• • 上一篇    下一篇

云计算环境中一种实现高可用性的动态数据复制策略

Da-Wei Sun1 (孙大为), Student Member, CCF, ACM, Gui-Ran Chang2 (常桂然), Shang Gao3 (高尚), Li-Zhong Jin1 (靳立忠), and Xing-Wei Wang1,* (王兴伟), Senior Member, CCF, ACM   

  • 收稿日期:2011-06-17 修回日期:2012-01-29 出版日期:2012-03-05 发布日期:2012-03-05

Modeling a Dynamic Data Replication Strategy to Increase System Availability in Cloud Computing Environments

Da-Wei Sun1 (孙大为), Student Member, CCF, ACM, Gui-Ran Chang2 (常桂然), Shang Gao3 (高尚), Li-Zhong Jin1 (靳立忠), and Xing-Wei Wang1,* (王兴伟), Senior Member, CCF, ACM   

  1. 1. School of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
    2. Computing Center, Northeastern University, Shenyang 110819, China;
    3. School of Engineering and Information Technology, Deakin University, Geelong, Victoria 3217, Australia
  • Received:2011-06-17 Revised:2012-01-29 Online:2012-03-05 Published:2012-03-05
  • Supported by:

    Supported by the National Natural Science Foundation of China under Grant Nos. 61070162, 71071028 and 70931001, the Speciali-zed Research Fund for the Doctoral Program of Higher Education of China under Grant Nos. 20110042110024 and 20100042110025, the Fundamental Research Funds for the Central Universities of China under Grant Nos. N100604012, N090504003 and N090504006.

在云计算环境中错误将以一种常态形式出现,为了实现高可用性,复制热点数据到多个可用站点是一个很好的选择,因为用户可以从距离其最近的站点进行数据访问。然而,在云计算环境中为哪些数据,在什么时候,创建多少副本,并如何放置副本亟待深入研究。本文提出了一种数据副本复制策略。首先,分析和建模了系统可用性和数据副本数目之间的关系。其次,对热点数据进行了量化,当数据的热度超过系统所设置的动态阈值时,将该数据标识为待创建副本的数据。再次,为了满足特定的系统比特效率,为副本确定所需要创建的副本数目,并以一种负载均衡的方式将所创建的副本放置到系统中的恰当位置。最后,设计了一种云计算环境中的动态数据副本复制算法。实验结果分析表明,所提出的数据副本复制策略在仅仅需要少量数据副本的情况下,极大的提高了云计算环境中数据的可用性,满足了系统的高可用要求。

Abstract: Failures are normal rather than exceptional in the cloud computing environments. To improve system avai-lability, replicating the popular data to multiple suitable locations is an advisable choice, as users can access the data from a nearby site. This is, however, not the case for replicas which must have a fixed number of copies on several locations. How to decide a reasonable number and right locations for replicas has become a challenge in the cloud computing. In this paper, a dynamic data replication strategy is put forward with a brief survey of replication strategy suitable for distributed computing environments. It includes: 1) analyzing and modeling the relationship between system availability and the number of replicas; 2) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 3) calculating a suitable number of copies to meet a reasonable system byte effective rate requirement and placing replicas among data nodes in a balanced way; 4) designing the dynamic data replication algorithm in a cloud. Experimental results demonstrate the efficiency and effectiveness of the improved system brought by the proposed strategy in a cloud.

[1] Foster I, Zhao Y, Raicu I, Lu S Y. Cloud computing and gridcomputing 360-degree compared. In Proc. Grid ComputingEnvironments Workshop, Austin, TX, USA, Nov. 12-16, 2008,pp.1-10.

[2] Buyya R, Yeo C S, Venugopal S, Broberg J, Brandic I. Cloudcomputing and emerging IT platforms: Vision, hype, and re-ality for delivering computing as the 5th utility. Future Gene-ration Computer Systems, 2009, 25(6): 599-616.

[3] Armbrust M, Fox A, Griffith R, Joseph A D, Katz R, Konwin-ski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M.A view of cloud computing. Communications of the ACM,2010, 53(4): 50-58.

[4] Mell P, Grance T. The NIST definition of cloud computing.Communications of the ACM, 2010, 53(6): 50.

[5] Iosup A, Ostermann S, Yigitbasi N, Prodan R, Fahringer T,Epema D H J. Performance analysis of cloud computing ser-vices for many-tasks scientific computing. IEEE Transactionson Parallel and Distributed Systems, 2011, 22(6): 931-945.

[6] Han Y B, Sun J Y, Wang G L, Li H F. A cloud-basedBPM architecture with user-end distribution of non-compute-intensive activities and sensitive data. Journal of ComputerScience and Technology, 2010, 25(6): 1157-1167.

[7] Wang H. Privacy-preserving data sharing in cloud computing.Journal of Computer Science and Technology, 2010, 25(3):401-414.

[8] He K Q, Wang J A, Liang P. Semantic interoperability aggre-gation in service requirements refinement. Journal of Com-puter Science and Technology, 2010, 25(6): 1103-1117.

[9] Xu B M, Zhao C Y, Hu E Z, Hu B. Job scheduling algorithmbased on Berger model in cloud environment. Advances inEngineering Software, 2011, 42(7): 419-425.

[10] Ghemawat S, Gobioff H, Leung S T. The Google file system.ACM SIGOPS Operating Systems Review, 2003, 37(5): 29-43.

[11] Shvachko K, Hairong K, Radia S, Chansler R. The Hadoopdistributed file system. In Proc. the 26th Symposium on MassStorage Systems and Technologies, Incline Village, NV, USA,May 3-7, 2010, pp.1-10.

[12] Wang S S, Yan K Q, Wang S C. Achieving efficient agreementwithin a dual-failure cloud-computing environment. ExpertSystem with Applications, 2010, 38(1): 906-915.

[13] Chang R S, Chang H P. A dynamic data replication strategyusing access-weights in data grids.Journal of Supercomputing,2008, 45(3): 277-295.

[14] Kim Y H, Jung M J, Lee C H. Energy-aware real-time taskscheduling exploiting temporal locality. IEICE Transactionson Information and Systems, 2010, 93(5): 1147-1153.

[15] Wei Q, Veeravalli B, Gong B, Zeng L, Feng D. CDRM: Acost-effective dynamic replication management scheme forcloud storage cluster. In Proc. 2010 IEEE InternationalConference on Cluster Computing, Heraklion, Crete, Greece,Sept. 20-24, 2010, pp.188-196.

[16] Bonvin N, Papaioannou T G, Aberer K. A self-organized,fault-tolerant and scalable replication scheme for cloud sto-rage. In Proc. the 1st ACM Symposium on Cloud Computing,Indianapolis, IN, USA, June 10-11, 2010, pp.205-216.

[17] Nguyen T, Cutway A, Shi W. Differentiated replication stra-tegy in data centers. In Proc. the IFIP International Confer-ence on Network and Parallel Computing, Zhengzhou, China,Sept. 13-15, 2010, pp.277-288.

[18] Mckusick M, Quinlan S. GFS: Evolution on fast-forward.Communications of the ACM, 2010, 53(3): 42-47.

[19] Ahmad N, Fauzi A A C, Sidek R M, Zin N M, Beg A H. Low-est data replication storage of binary vote assignment datagrid. In Proc. the 2nd International Conference NetworkedDigital Technologies, Prague, Czech Republic, July 7-9, 2010,pp.466-473.

[20] Rahman R M, Barker K, Alhajj R. Replica placement designwith static optimality and dynamic maintainability. In Proc.the 6th IEEE International Symposium on Cluster Comput-ing and the Grid, Singapore, May 16-19, 2006, pp.434-437.

[21] Dogan A. A study on performance of dynamic file replica-tion algorithms for real-time file access in data grids. FutureGeneration Computer Systems, 2009, 25(8): 829-839.

[22] Lei M, Vrbsky S V, Hong X. An on-line replication strategy toincrease availability in data grids. Future Generation Com-puter Systems, 2008, 24(2): 85-98.

[23] Litke A, Skoutas D, Tserpes K, Varvarigou T. Efficient taskreplication and management for adaptive fault tolerance inmobile grid environments. Future Generation Computer Sys-tems, 2007, 23(2): 163-178.

[24] Dobber M, van der Mei R, Koole G. Dynamic load balanc-ing and job replication in a global-scale grid environment: Acomparison. IEEE Transactions on Parallel and DistributedSystems, 2009, 20(2): 207-218.

[25] Yuan D, Yang Y, Liu X, Chen J. A data placement strategyin scientific cloud workflows. Future Generation ComputerSystems, 2010, 26(8): 1200-1214.

[26] Rood B, Lewis M J. Grid resource availability prediction-based scheduling and task replication. Journal of Grid Com-puting, 2009, 7(4): 479-500.

[27] Latip R, Othman M, Abdullah A, Ibrahim H, Md SulaimanN. Quorum-based data replication in grid environment. In-ternational Journal of Computational Intelligence Systems,2009, 2(4): 386-397.

[28] Avizienis A, Laprie J C, Randell B R, Landwehr C. Basicconcepts and taxonomy of dependable and secure computing.IEEE Transactions on Dependable and Secure Computing,2004, 1(1): 11-33.

[29] Al-Kuwaiti M, Kyriakopoulos N, Hussein S. A comparativeanalysis of network dependability, fault-tolerance, reliability,security, and survivability. IEEE Communications Surveys &Tutorials, 2009, 11(2): 106-124.

[30] Ray I, Ray I, Chakraborty S. An interoperable context sensi-tive model of trust. Journal of Intelligent Information Sys-tems, 2009, 32(1): 75-104.

[31] Tu M, Li P, Yen I L, Thuraisingham B M, Khan L. Securedata objects replication in data grid. IEEE Transactions onDependable and Secure Computing, 2010, 7(1): 50-64.

[32] Wang J Y, Jea K F. A near-optimal database allocation forreducing the average waiting time in the grid computing en-vironment. Information Sciences, 2009, 179(21): 3772-3790.

[33] Jung D, Chin S H, Chung K S, Suh T, Yu H C, Gil J M. Aneffective job replication technique based on reliability and per-formance in mobile grids. InProc. the 5th International Con-ference Advances in Grid and Pervasive Computing, Hualien,Taiwan, China, May 10-13, 2010, pp.47-58.

[34] Buyya R, Ranjan R, Calheiros R N. Modeling and simulationof scalable cloud computing environments and the CloudSimtoolkit: Challenges and opportunities. In Proc. 2009 In-ternational Conference on High Performance Computing &Simulation, Leipzig, Germany, June 21-24, 2009, pp.1-11.

[35] Belalem G, Tayeb F Z, Zaoui W. Approaches to improve theresources management in the simulator CloudSim. In Proc.the 1st International Conference Information Computing andApplications, Tangshan, China, Oct. 15-18, 2010, pp.189-196.

[36] Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F,Buyya R. CloudSim: A toolkit for modeling and simulation ofcloud computing environments and evaluation of resource pro-visioning algorithms. Software-Practice & Experience, 2011,41(1): 23-50.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 费翔林; 廖雷; 王和珍; 汪承藻;. Structured Development Environment Based on the Object-Oriented Concepts[J]. , 1992, 7(3): 193 -201 .
[2] 陈珂; 石川真澄;. A Parallel Voting Scheme for Aspect Recovery[J]. , 1995, 10(5): 385 -402 .
[3] 武君胜; 吴广茂;. Element-Partition-Based Methods for Visualization of 3D Unstructured Grid Data[J]. , 1998, 13(5): 417 -425 .
[4] 陈海明;. Function Definition Language FDL andIts Implementation[J]. , 1999, 14(4): 414 -421 .
[5] 舒炎泰; 薛飞; 金志刚;. The Impact of Self-Similar Traffic on Network Delay[J]. , 1999, 14(6): 585 -589 .
[6] 马军; 杨波; 马绍汉;. A Practical Algorithm for the Minimum Rectilinear Steiner Tree[J]. , 2000, 15(1): 96 -99 .
[7] 马宗民; ZHANG W.J; MA W.Y;. Extending the Relational Model to Deal with Probabilistic Data[J]. , 2000, 15(3): 230 -240 .
[8] . 作为概念内涵的逻辑句子[J]. , 2005, 20(3): 338 -344 .
[9] . 水稻基因组中预测基因的程序评估及测试数据集[J]. , 2005, 20(4): 446 -453 .
[10] . 实体建模的径向超形[J]. , 2006, 21(2): 238 -243 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: