Carbon-Aware Energy Cost Optimization of Data Analytics Across Geo-Distributed Data Centers
-
Abstract
The amount and scale of worldwide data centers grow rapidly in the era of big data, leading to massive energy consumption and formidable carbon emission. To achieve the efficient and sustainable development of information technology (IT) industry, researchers propose to schedule data or tasks of data analytics jobs to data centers with low electricity prices and carbon emission rates. However, due to the highly heterogeneous and dynamic nature of geo-distributed data centers in terms of resource capacity, electricity price, and the rate of carbon emissions, it is quite difficult to optimize the electricity cost and carbon emission of data centers over a long period. In this paper, we propose an Energy-aware Data backup and Job scheduling method with minimal Cost (EDJC) to minimize the electricity cost of geo-distributed data analytics jobs, and simultaneously ensure the long-term carbon emission budget of each data center. Specifically, we firstly design a cost-effective data backup algorithm to generate a data backup strategy that minimizes cost based on historical job requirements. After that, based on the data backup strategy, we utilize an online carbon-aware job scheduling algorithm to calculate the job scheduling strategy in each time slot. In this algorithm, we use the Lyapunov optimization to decompose the long-term job scheduling optimization problem into a series of real-time job scheduling optimization subproblems, and thereby minimize the electricity cost and satisfy the budget of carbon emission. The experimental results show that EDJC method can significantly reduce the total electricity cost of the data center and meet the carbon emission constraints of the data center at the same time.
-
-