通过优化资源流水线调度加速DAG作业执行

段钰斌; 王宁

doi:10.1007/s11390-021-1488-4

摘要: 1、研究背景：
随着数据中心需要处理的数据量的爆炸式增长，提高数据中心作业执行速度成为了十分重要的研究课题。除了提高数据中心的计算能力之外，合理调度数据处理作业的执行也是提高任务执行效率的关键因素。特别地，这些数据处理作业通常包含多个阶段（stage）。并且阶段之间还有复杂的处理顺序约束。为了提高数据处理作业的运行速度，本文重点研究了数据处理作业中各个阶段的调度问题。
2、目的（Objective）：
本文研究的主要目的是为了加速数据处理作业的执行速度，即最小化数据处理作业的完工时间（make span）。
3、方法（Method）：
通常，数据处理作业执行的运算图可以由有向无环图（DAG）来表示。基于这一运算图模型，对于数据处理作业的调度问题可以抽象为DAG调度问题（DAG scheduling problem）。我们发现，运算图中每个阶段的执行需要占用不同类型的资源，而这些资源的可以按照流水线（pipeline）的方式分配、调度、和使用。通过增加各类资源并行使用的时间，数据处理作业的完工时间可以被有效减小。对此，我们理论分析了对于符合阿姆达尔定律（Amdahl's law）的理想作业的调度方法。通过避免不同作业阶段对同种资源的竞争，我们的调度算法可以实现3/2的近似比（approximate ratio）。对于其他非理想情况的作业调度，我们通过改写一个基于强化学习（reinforcement learning）的调度器，实现了动态的作业调度，包括动态调整每个阶段的优先级和并行程度（parallelism level）。
4、结果（Result & Findings）：
实验中，我们通过使用真实世界的集群历史轨迹（trace）数据模拟了不同结构的DAG运算图。与缺省调度器相比，通过优化资源流水线的调度，CPU的平均使用率提高了33%。
5、结论（Conclusions）：
我们的理论模型显示能够避免资源竞争的调度算法可以实现3/2的近似比。实验结果显示基于强化学习的调度器可以显著提高CPU和网络资源的利用率，并以此减小数据处理作业的完工时间。但是，基于强化学习的调度器缺少确切的近似比，即对最坏情况下的表现提供保证。此外，强化学习中用到的深度网络模型缺乏可解释性。因而，通过将更多的人工经验嵌入到强化学习模型中以提高模型的可靠性以及可解释性是一个有意义的未来研究方向。

Abstract: The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.

通过优化资源流水线调度加速DAG作业执行

Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling