›› 2013,Vol. 28 ›› Issue (6): 948-961.doi: 10.1007/s11390-013-1390-9

所属专题: Data Management and Data Mining

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

虚拟计算环境下面向数据密集型工作流的能耗感知启发式调度算法

Peng Xiao1, 2 (肖鹏), Student Member, CCF, ACM, IEEE, Zhi-Gang Hu2, * (胡志刚), Senior Member, CCF, and Yan-Ping Zhang2, 3(张艳平), Student Member, ACM, IEEE   

  • 收稿日期:2012-09-02 修回日期:2013-05-06 出版日期:2013-11-05 发布日期:2013-11-05
  • 作者简介:Peng Xiao received the M.S degree in computer science from Xiamen University, China. Now, he is currently a Ph.D. candidate in Central South University, Changsha and a lecturer in Hunan Institute of Engineering, Xiangtan. His research interests include cloud computing, resource virtualization technology, and HPC energy-efficiency management.
    Zhi-Gang Hu received his M.S. and Ph.D. degrees in computer science from Central South University, Changsha. Now he is a professor in Central South University. His current research interests include parallel and distributed computing, cloud computing, grid computing, green HPC policy and management, and power optimization in embedded systems.
    Yan-Ping Zhang received her M.S. degree in computer science from Central South University, Changsha. Now she is a Ph.D. candidate in Technical University of Munich, Germany. Her research interests include cloud-based large-scale application, virtual machine power model, workflow scheduling model and algorithm.

An Energy-Aware Heuristic Scheduling for Data-Intensive Workflows in Virtualized Datacenters

Peng Xiao1, 2 (肖鹏), Student Member, CCF, ACM, IEEE, Zhi-Gang Hu2, * (胡志刚), Senior Member, CCF, and Yan-Ping Zhang2, 3(张艳平), Student Member, ACM, IEEE   

  1. 1 School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, China;
    2 School of Information Science and Engineering, Central South University, Changsha 410083, China;
    3 College of Computation and Bioinformatics, Technical University of Munich, Freising 85354, Germany
  • Received:2012-09-02 Revised:2013-05-06 Online:2013-11-05 Published:2013-11-05
  • About author:Peng Xiao received the M.S degree in computer science from Xiamen University, China. Now, he is currently a Ph.D. candidate in Central South University, Changsha and a lecturer in Hunan Institute of Engineering, Xiangtan. His research interests include cloud computing, resource virtualization technology, and HPC energy-efficiency management.
    Zhi-Gang Hu received his M.S. and Ph.D. degrees in computer science from Central South University, Changsha. Now he is a professor in Central South University. His current research interests include parallel and distributed computing, cloud computing, grid computing, green HPC policy and management, and power optimization in embedded systems.
    Yan-Ping Zhang received her M.S. degree in computer science from Central South University, Changsha. Now she is a Ph.D. candidate in Technical University of Munich, Germany. Her research interests include cloud-based large-scale application, virtual machine power model, workflow scheduling model and algorithm.
  • Supported by:

    Supported by the National Natural Science Foundation of China under Grant Nos. 60970038, 61272148, the Science and Technology Plan Project of Hunan Province of China under Grant No. 2012GK3075, and the Scientific Research Fund of Hunan Provincial Education Department of China under Grant No. 13B015.

随着云计算技术的发展,越来越多的数据密集型工作流被部署到虚拟化计算环境中,由此导致云系统在数据存储与访问上的能耗开销日益增大。对此,本文提出了一种基于“最小数据访问能耗路径”的启发式调度算法,目的在于降低数据密集型工作流在执行期间的能耗开销。该算法在传统DAG调度算法中引入了数据访问相关的能耗指标,并针对虚拟化计算环境的执行特定,将工作流调度过程分解为一个两阶段虚拟机部署过程。其中,第一阶段通过优化配置虚拟机的底层存储设备来降低数据传输延迟所造成的能耗开销;第二阶段则通过评估中间数据的能耗开销来调度工作流子任务。实验结果显示,本文所提算法能够有效降低动态中间数据所导致的各类无效能耗开销,从而提高虚拟计算系统的整体能效指标。此外,较其它算法而言,该算法在面对I/O密集型负载时显示较好鲁棒性。

Abstract: With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energyaware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data-intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O-intensive workloads.

[1] Sun D W, Chang G R, Gao S, Jin L Z,Wang X W. Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. Journal of Computer Science and Technology, 2012, 27(2): 256-272.

[2] Sedaghat M, Hernández F, Elmroth E. Unifying cloud management: Towards overall governance of business level objectives. In Proc. the 11th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2011, pp.591-597.

[3] Iosup A, Yigitbasi N, Epema D. On the performance variability of production cloud services. In Proc. the 11th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2011, pp.104-113.

[4] Mahadevan P, Banerjee S, Sharma P, Shah A, Ranganathan P. On energy efficiency for enterprise and data center networks. IEEE Communications Magazine, 2011, 49(8): 94100.

[5] Goth G. Data center operators face energy irony. IEEE Internet Computing, 2010, 14(2): 7-10.

[6] Wang J, Feng L, Xue W, Song Z. A survey on energy-efficient data management. SIGMOD Record, 2011, 40(2): 17-23.

[7] Figueiredo J, Maciel P, Callou G, Tavares E, Sousa E, Silva B. Estimating reliability importance and total cost of acquisition for data center power infrastructures. In Proc. the IEEE Int. Conf. Systems, Man, and Cybernetics, Oct. 2011, pp.421-426.

[8] Li J X, Li B, Wo T Y, Hu C M, Huai J P, Liu L, Lam K P. CyberGuarder: A virtualization security assurance architecture for green cloud computing. Future Generation Computer Systems, 2012, 28(2): 379-390.

[9] Garg S K, Yeob C S, Anandasivamc A, Buyyaa R. Environment-conscious scheduling of HPC applications on distributed cloud-oriented data centers. Journal of Parallel Distributed Computing, 2011, 71(6): 732-749.

[10] Juve G, Deelman E, Berriman G B, Berman B P, Maechling P. An evaluation of the cost and performance of scientific workflows on Amazon EC2. Journal of Grid Computing, 2012, 10(1): 5-21.

[11] Yuan D, Yang Y, Liu X, Zhang G, Chen J. A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurrency and Computation: Practice and Experience, 2012, 24(9): 956-976.

[12] Tolosana-Calasanza R, Bañares J A, Pham C, Rana O F. Enforcing QoS in scientific workflow systems enacted over cloud infrastructures. Journal of Computer and System Sciences, 2012, 78(5): 1300-1315.

[13] Sotomayor B, Montero R S, Llorente I M, Foster I. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Computing, 2009, 13(5): 14-22.

[14] Chapman C, Emmerich W, Márquez F G, Clayman S, Galis A. Software architecture definition for on-demand cloud provisioning. Cluster Computing, 2012, 15(2): 79-100.

[15] Kirschnick J, Alcaraz-Calero J M, Goldsack P, Farrell A, Guijarro J, Loughran S, Edwards N, Wilcock L. Towards an architecture for deploying elastic services in the cloud. Software: Practice and Experience, 2012, 42(4): 395-408.

[16] Cherkasova L, Gupta D, Vahdat A. Comparison of the three CPU schedulers in Xen. ACM SIGMETRICS Performance Evaluation Review, 2007, 35(2): 42-51.

[17] Krishnan B, Amur H, Gavrilovska A, Schwan K. VM power metering: Feasibility and challenges. ACM SIGMETRICS Performance Evaluation Review, 2010, 38(3): 56-60.

[18] Kang H, Chen Y, Wong J L, Radu S, Wu J. Enhancement of Xen's scheduler for MapReduce workloads. In Proc. the 20th Int. Symp. High Performance Distributed Computing, June 2011, pp.251-262.

[19] Kim H, Lim H, Jeong J, Jo H, Lee J. Task-aware virtual machine scheduling for I/O performance. In Proc. the 2009 ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution, March 2009, pp.101-110.

[20] Abbasi Z, Varsamopoulos G, Gupta S K S. TACOMA: Server and workload management in Internet data centers considering cooling-computing power trade-off and energy proportionality. ACM Transactions on Architecture and Code Optimization, 2012, 9(2): Article No.11.

[21] Fang W, Liang X, Sun Y, Vasilakos A V. Network element scheduling for achieving energy-aware data center networks. International Journal of Computers Communications and Control, 2012, 7(2):241-251.

[22] Benoit A, Goud P R, Robert Y. Performance and energy optimization of concurrent pipelined applications. In Proc. the 24th IEEE Int. Symp. Parallel and Distributed Processing, Apr 2010, pp.1-12.

[23] Baskiyar S, Abdel-Kader R. Energy aware DAG scheduling on heterogeneous systems. Cluster Computing, 2010, 13(4): 373-383.

[24] Rizvandi N B, Taheri J, Zomaya A Y, Lee Y C. Linear combinations of DVFs-enabled processor frequencies to modify the energy-aware scheduling algorithms. In Proc. the 10th IEEE/ACM Int. Conf. Cluster, Cloud and Grid Computing, May 2010, pp.388-397.

[25] Lee Y C, Zomaya A Y. Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(8): 1374-1381.

[26] Mezmaza M, Melab N, Kessaci Y, Lee Y C, Talbi E G, Zomaya A Y, Tuyttens D. A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems. Journal of Parallel and Distributed Computing, 2011, 71(11): 1497-1508.

[27] Zhu D, Melhem R, Childers B R. Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor real-time systems. IEEE Transactions on Parallel and Distributed Systems, 2003, 14(7): 686-700.

[28] Zong Z, Briggs M, Connor N, Xiao Q. An energy-efficient framework for large-scale parallel storage systems. In Proc. the 21st IEEE Int. Symp. Parallel and Distributed Processing, Mar. 2007, pp.1-7.

[29] Manzanares A, Bellam K, Qin X. A prefetching scheme for energy conservation in parallel disk systems. In Proc. the 22nd IEEE Int. Symp. Parallel and Distributed Processing, Apr. 2008, pp.1-5.

[30] Bohra A, Chaudhary V. Vmeter: Power modelling for virtualized clouds. In Proc. the 24th IEEE Int. Symp. Parallel and Distributed Processing, Apr. 2010, pp.1-8.

[31] Cho S, Melhem R G. On the interplay of parallelization, program performance, and energy consumption. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(3): 342353.

[32] Kim K H, Beloglazov A, Buyya R. Power-aware provisioning of virtual machines for real-time cloud services. Concurrency and Computation: Practice and Experience, 2011, 23(13):1491-1505.

[33] Speitkamp B, Bichler M. A mathematical programming approach for server consolidation problems in virtualized data centers. IEEE Transactions on Services Computing, 2010, 3(4): 266-278.

[34] Hupfeld F, Cortes T, Kolbeck B, Stender J, Focht E, Hess M, Malo J, Martí J, Cesario E. The XtreemFS architecture | A case for object-based file systems in grids. Concurrency and Computation: Practice and Experience, 2008, 20(17): 20492060.

[35] Topcuoglu H, Hariri S, Wu M Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 260-274.

[36] Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F, Buyya R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 2011, 41(1): 23-50.

[37] Berlinska J, Drozdowski M. Scheduling divisible MapReduce computations. Journal of Parallel and Distributed Computing, 2011, 71(3): 450-459.

[38] Kiss T, Greenwell P, Heindl H, Terstyánszky G, Weingarten N. Parameter sweep workflows for modelling carbohydrate recognition. Journal of Grid Computing, 2010, 8(4): 587-601.

[39] Kansal A, Zhao F, Liu J, Kothari N, Bhattacharya A A. Virtual machine power metering and provisioning. In Proc. the 1st ACM Symp. Cloud Computing, June 2010, pp.39-50.

[40] Theiner D, Wieczorek M. Reduction of calibration time of distributed hydrological models by use of grid computing and nonlinear optimisation algorithms. In Proc. the 7th Int. Conf. Hydroinformatics, Sept. 2006.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 吴允曾;. On the Development of Applications of Logic in Programming[J]. , 1987, 2(1): 30 -34 .
[2] 王镭; 谭英;. The Researches in Fault-Tolerant D ataflow Architecture[J]. , 1991, 6(4): 395 -398 .
[3] 王德强; 赵连昌;. The Twisted-Cube Connected Networks[J]. , 1999, 14(2): 181 -187 .
[4] . 暂缺[J]. , 2006, 21(6): 952 -964 .
[5] . 解决数据库集成中的语义冲突:一种基于本体的新方法[J]. , 2007, 22(2): 218 -227 .
[6] . PGG:一种基于模式的在线数据流变化管理方法[J]. , 2008, 23(4 ): 497 -515 .
[7] Shung Han Cho Yuntai Kyong Sangjin Hong We-Duke Cho. [J]. , 2009, 24(3): 588 -603 .
[8] Po Hu, Min-Lie Huang, and Xiao-Yan Zhu. 从富有信息量的新闻事件中挖掘故事线交互关系[J]. , 2014, 29(3): 502 -518 .
[9] Jing-Yuan Zhao, Mei-Qin Wang, Long Wen . 改进的CAST-256的线性分析[J]. , 2014, 29(6): 1134 -1139 .
[10] Peng Du, Jie-Yi Zhao, Wan-Bin Pan, Yi-Gang Wang. 虚拟装配过程中基于GPU的实时碰撞处理算法[J]. , 2015, 30(3): 511 -518 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: