Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (4): 852-868.doi: 10.1007/s11390-021-1488-4

Special Issue: Computer Networks and Distributed Computing

• Special Section of MASS 2020-2021 • Previous Articles     Next Articles

Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling

Yubin Duan1 (段钰斌), Student Member, IEEE, Ning Wang2 (王宁), Member, IEEE, and Jie Wu1,*, Fellow, IEEE        

  1. 1Department of Computer and Information Sciences, Temple University, Philadelphia 19122, U.S.A.
    2Department of Computer Science, Rowan University, Glassboro 08028, U.S.A.
  • Received:2021-04-06 Revised:2021-09-01 Accepted:2021-11-23 Online:2022-07-25 Published:2022-07-25
  • Contact: Jie Wu E-mail:jiewu@temple.edu
  • About author:Jie Wu is the director of the Center for Networked Computing and Laura H. Carnell professor at Temple University, Philadelphia. He also serves as the director of International Affairs at College of Science and Technology. He served as the Chair of Department of Computer and Information Sciences from the summer of 2009 to the summer of 2016 and Associate Vice Provost for International Affairs from the fall of 2015 to the summer of 2017. Prior to joining Temple University, he was a program director at the National Science Foundation and was a distinguished professor at Florida Atlantic University. His current research interests include mobile computing and wireless networks, routing protocols, cloud and green computing, network trust and security, and social network applications. Dr. Wu regularly publishes in scholarly journals, conference proceedings, and books. He serves on several editorial boards, including IEEE Transactions on Service Computing and the Journal of Parallel and Distributed Computing. Dr. Wu was general co-chair for IEEE MASS 2006, IEEE IPDPS 2008, IEEE ICDCS 2013, ACM MobiHoc 2014, ICPP 2016, and IEEE CNS 2016, as well as program co-chair for IEEE INFOCOM 2011 and CCF CNCC 2013. He was an IEEE Computer Society Distinguished Visitor, ACM Distinguished Speaker, and the chair for the IEEE Technical Committee on Distributed Processing (TCDP). Dr. Wu is a CCF Distinguished Speaker and a fellow of the IEEE. He is the recipient of the 2011 China Computer Federation (CCF) Overseas Outstanding Achievement Award.
  • Supported by:
    This work was supported in part by the National Science Foundation of U.S.A. under Grant Nos.~CNS 2128378, CNS 2107014, CNS 1824440, CNS 1828363, CNS 1757533, CNS 1629746, and CNS 1651947.

The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.

Key words: data center cluster; directed acyclic graph scheduling; makespan minimization; pipeline;

[1] Duan Y, Wang N, Wu J. Reducing makespans of DAG scheduling through interleaving overlapping resource utilization. In Proc. the 17th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, December 2020, pp.392-400. DOI: 10.1109/MASS50613.2020.00055.

[2] Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A. Quincy: Fair scheduling for distributed computing clusters. In Proc. the 22nd ACM SIGOPS Symposium on Operating Systems Principles, October 2009, pp.261-276. DOI: 10.1145/1629575.1629601.

[3] Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A. Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review, 2014, 44(4): 455-466. DOI: 10.1145/2740070.2626334.

[4] Zhang Z, Li C, Tao Y, Yang R, Tang H, Xu J. Fuxi: A faulttolerant resource management and job scheduling system at Internet scale. Proc. the VLDB Endowment, 2014, 7(13): 1393-1404. DOI: 10.14778/2733004.2733012.

[5] Vulimiri A, Curino C, Godfrey P B, Jungblut T, Padhye J, Varghese G. Global analytics in the face of bandwidth and regulatory constraints. In Proc. the 12th USENIX Symposium on Networked Systems Design and Implementation, May 2015, pp.323-336.

[6] Grandl R, Kandula S, Rao S, Akella A, Kulkarni J. GRAPHENE: Packing and dependency-aware scheduling for data-parallel clusters. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.81-97.

[7] Hu Z, Li B, Chen C, Ke X. FlowTime: Dynamic scheduling of deadline-aware workflows ad-hoc jobs. In Proc. the 38th IEEE International Conference on Distributed Computing Systems, July 2018, pp.929-938. DOI: 10.1109/ICDCS.2018.00094.

[8] Brucker P. Scheduling Algorithms (5th edition). Springer, 2007.

[9] Wang H, Sinnen O. List-scheduling versus cluster-scheduling. IEEE Transactions on Parallel, Distributed Systems, 2018, 29(8): 1736-1749. DOI: 10.1109/TPDS.2018.2808959.

[10] Johnson S M. Optimal two-and three-stage production schedules with setup times included. Naval Research Logistics Quarterly, 1954, 1(1): 61-68. DOI: 10.1002/nav.3800010110.

[11] Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In Proc. the AFIPS ’67 Spring Joint Computer Conference, April 1967, pp.483-485. DOI: 10.1145/1465482.1465560.

[12] Mao H, Schwarzkopf M, Venkatakrishnan S B, Meng Z, Alizadeh M. Learning scheduling algorithms for data processing clusters. In Proc. the ACM Special Interest Group on Data Communication, August 2019, pp.270-288. DOI: 10.1145/3341302.3342080.

[13] Zaharia M, Borthakur D, Sen S J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: A simple technique for achieving locality, fairness in cluster scheduling. In Proc. the 5th European Conference on Computer Systems, April 2010, pp.265-278. DOI: 10.1145/1755913.1755940.

[14] Khalil E, Dai H, Zhang Y, Dilkina B, Song L. Learning combinatorial optimization algorithms over graphs. In Proc. the Annual Conference on Neural Information Processing Systems, December 2017, pp.6348-6358.

[15] Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3/4): 229-256. DOI: 10.1007/BF00992696.

[16] Weaver L, Tao N. The optimal reward baseline for gradient-based reinforcement learning. arXiv:1301.2315, 2013. https://arxiv.org/abs/1301.2315, Jan. 2022.

[17] Shao W, Xu F, Chen L, Zheng H, Liu F. Stage delay scheduling: Speeding up DAG-style data analytics jobs with resource interleaving. In Proc. the 48th International Conference on Parallel Processing, August 2019, Article No. 8. DOI: 10.1145/3337821.3337872.

[18] Hu Z, Li B, Qin Z, Goh R S M. Job scheduling without prior information in big data processing systems. In Proc. the 37th IEEE International Conference on Distributed Computing Systems, June 2017, pp.572-582. DOI: 10.1109/ICDCS.2017.105.

[19] Liu S, Wang H, Li B. Optimizing shuffle in wide-area data analytics. In Proc. the 37th IEEE International Conference on Distributed Computing Systems, June 2017, pp.560-571. DOI: 10.1109/ICDCS.2017.131.

[20] Delimitrou C, Kozyrakis C. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 2013, 48(4): 77-88. DOI: 10.1145/2499368.2451125.

[21] Vavilapalli V K, Murthy A C, Douglas C et al. Apache Hadoop YARN: Yet another resource negotiator. In Proc. the 4th Annual Symposium on Cloud Computing, October 2013, Article No. 5. DOI: 10.1145/2523616.2523633

[22] Delimitrou C, Kozyrakis C. Quasar: Resource-efficient, QoS-aware cluster management. ACM SIGARCH Computer Architecture News, 2014, 42(1): 127-144. DOI: 10.1145/2654822.2541941.

[23] Zhang W, Zheng N, Chen Q, Yang Y, Song Z, Ma T, Leng J, Guo M. URSA: Precise capacity planning, fair scheduling based on low-level statistics for public clouds. In Proc. the 49th International Conference on Parallel Processing, August 2020, Article No. 73. DOI: 10.1145/3404397.3404451.

[24] Ousterhout K, Canel C, Ratnasamy S, Shenker S. Monotasks: Architecting for performance clarity in data analytics frameworks. In Proc. the 26th Symposium on Operating Systems Principles, October 2017, pp.184-200. DOI: 10.1145/3132747.3132766.

[25] Agrawal K, Li J, Lu K, Moseley B. Scheduling parallel DAG jobs online to minimize average flow time. In Proc. the 27th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2016, pp.176-189. DOI: 10.1137/1.9781611974331.ch14.

[26] Chekuri C, Goel A, Khanna S, Kumar A. Multi-processor scheduling to minimize flow time with ε resource augmentation. In Proc. the 36th Annual ACM Symposium on Theory of Computing, June 2004, pp.363-372. DOI: 10.1145/1007352.1007411.

[27] Mastrolilli M, Svensson O. (Acyclic) job shops are hard to approximate. In Proc. the 49th Annual IEEE Symposium on Foundations of Computer Science, October 2008, pp.583-592. DOI: 10.1109/FOCS.2008.36.

[28] Shmoys D B, Stein C, Wein J. Improved approximation algorithms for shop scheduling problems. SIAM Journal on Computing, 1994, 23(3): 617-632. DOI: 10.1137/S009753979222676X.

[29] Zheng H, Wu J. Joint scheduling of overlapping MapReduce phases: Pair jobs for optimization. IEEE Transactions on Services Computing, 2021, 14(5): 1453-1463. DOI: 10.1109/TSC.2018.2875698.

[30] Zheng H, Wan Z, Wu J. Optimizing MapReduce framework through joint scheduling of overlapping phases. In Proc. the 25th IEEE International Conference on Computer Communication and Networks, August 2016. DOI: 10.1109/ICCCN.2016.7568555.

[31] Grandl R, Chowdhury M, Akella A, Ananthanarayanan G. Altruistic scheduling in multi-resource clusters. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.65-80.

[32] Ferguson R D, Bodı́k P, Kandula S, Boutin E, Fonseca R. Jockey: Guaranteed job latency in data parallel clusters. In Proc. the 7th EuroSys Conference on Computer Systems, April 2012, pp.99-112. DOI: 10.1145/2168836.2168847.

[33] Im S, Kell N, Kulkarni J, Panigrahi D. Tight bounds for online vector scheduling. In Proc. the 56th IEEE Annual Symposium on Foundations of Computer Science, October 2015, pp.525-544. DOI: 10.1109/FOCS.2015.39.

[34] Tan H, Han Z, Li X Y, Lau F C M. Online job dispatching, scheduling in edge-clouds. In Proc. the IEEE Conference on Computer Communications, May 2017. DOI: 10.1109/INFOCOM.2017.8057116.

[35] Marchetti-Spaccamela A, Megow N, Schlöter J, Skutella M, Stougie L. On the complexity of conditional DAG scheduling in multiprocessor systems. In Proc. the IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.1061-1070. DOI: 10.1109/IPDPS47924.2020.00112.

[36] Luo J H, Zhou Y F, Li X J, Yuan M X, Yao J G, Zeng J. Learning to optimize DAG scheduling in heterogeneous environment. arXiv:2103.06980, 2021. https://arxiv.org/abs/2103.06980, March 2022.

[1] Jian Liu, Jia-Liang Sun, Yong-Zhuang Liu. Effective Identification and Annotation of Fungal Genomes [J]. Journal of Computer Science and Technology, 2021, 36(2): 248-260.
[2] Yu Zhang, Yu-Fen Yu, Hui-Fang Cao, Jian-Kang Chen, Qi-Liang Zhang. CHAUS:Scalable VM-Based Channels for Unbounded Streaming [J]. , 2017, 32(6): 1288-1304.
[3] Xu-Meng Wang, Tian-Ye Zhang, Yu-Xin Ma, Jing Xia, Wei Chen. A Survey of Visual Analytic Pipelines [J]. , 2016, 31(4): 787-804.
[4] Yu Zhang, Zhao-Peng Li, Hui-Fang Cao. System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism [J]. , 2015, 30(1): 57-73.
[5] Hong-Guang Ren (任洪广), Zhi-Ying Wang (王志英), Senior Member, CCF, Member, ACM, IEEE and Doug Edwards, Member, ACM, IEEE. Structure-Based Deadlock Checking of Asynchronous Circuits [J]. , 2011, 26(6): 1031-1040.
[6] Jun Yao (姚骏), Member, IEEE, Shinobu Miwa, Hajime Shimada, and Shinji Tomita, Member, ACM, IEEE. A Fine-Grained Runtime Power/Performance Optimization Method for Processors with Adaptive Pipeline Depth [J]. , 2011, 26(2): 292-301.
[7] Wei-Wu Hu, Ji-Ye Zhao, Shi-Qiang Zhong, Xu Yang, Elio Guidetti, and Chris Wu. Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology [J]. , 2007, 22(1): 1-0.
[8] Wei-Wu Hu, Fu-Xin Zhang, and Zu-Song Li. Microarchitecture of the Godson-2 Processor [J]. , 2005, 20(2): 0-0.
[9] WU Jie(吴杰)and CHEN Xiao. Fault-Tlerant Tree-Based Multicasting in Mesh Multicomputers [J]. , 2001, 16(5): 0-0.
[10] Tang Zhimin; Xia Peisu;. A Maximum Time Difference Pipelined Arithmetic Unit Based on CMOS Gate Array [J]. , 1995, 10(2): 97-103.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Feng Yulin;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[4] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[5] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[6] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[7] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[8] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[9] Pan Qijing;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[10] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved