计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (2): 453-467.doi: 10.1007/s11390-020-9702-3

• • 上一篇    下一篇

Bigflow:一种分布式计算框架的通用优化层

Yun-Cong Zhang1, Xiao-Yang Wang1,2,*, Cong Wang1, Yao Xu1, Jian-Wei Zhang1, Xiao-Dong Lin1, Guang-Yu Sun2, Member, CCF, IEEE, Gong-Lin Zheng1, Shan-Hui Yin1, Xian-Jin Ye1, Li Li1, Zhan Song1, Dong-Dong Miao1   

  1. 1 Baidu Inc., Beijing 100193, China;
    2 Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
  • 收稿日期:2019-05-21 修回日期:2020-01-17 出版日期:2020-03-05 发布日期:2020-03-18
  • 通讯作者: Xiao-Yang Wang E-mail:yaoer@pku.edu.cn
  • 作者简介:Yun-Cong Zhang received his B.S. degree from Nanyang Institute of Technology, Nanyang. He is now working in Baidu in the field of distributed computing and autonomous driving. He used to lead a team to build a unified presentation layer for distributed systems in the Department of Infrastructure. Now he is working in the Planning & Control team (PNC) in the Intelligence Driving Group.
  • 基金资助:
    This work is supported by the National Key Research and Development Project of China under Grant No. 2018YFB1003304 and Beijing Academy of Artificial Intelligence (BAAI).

Bigflow: A General Optimization Layer for Distributed Computing Frameworks

Yun-Cong Zhang1, Xiao-Yang Wang1,2,*, Cong Wang1, Yao Xu1, Jian-Wei Zhang1, Xiao-Dong Lin1, Guang-Yu Sun2, Member, CCF, IEEE, Gong-Lin Zheng1, Shan-Hui Yin1, Xian-Jin Ye1, Li Li1, Zhan Song1, Dong-Dong Miao1        

  1. 1 Baidu Inc., Beijing 100193, China;
    2 Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
  • Received:2019-05-21 Revised:2020-01-17 Online:2020-03-05 Published:2020-03-18
  • Contact: Xiao-Yang Wang E-mail:yaoer@pku.edu.cn
  • About author:Yun-Cong Zhang received his B.S. degree from Nanyang Institute of Technology, Nanyang. He is now working in Baidu in the field of distributed computing and autonomous driving. He used to lead a team to build a unified presentation layer for distributed systems in the Department of Infrastructure. Now he is working in the Planning & Control team (PNC) in the Intelligence Driving Group.
  • Supported by:
    This work is supported by the National Key Research and Development Project of China under Grant No. 2018YFB1003304 and Beijing Academy of Artificial Intelligence (BAAI).

随着数据量的大幅增加,分布式计算已成为数据中心中处理海量数据集的通用方法。虽然已经有了很多相关的计算模型和数据集抽象,但绝大多数都采用了单层的数据分区模型,这导致开发人员经常需要手动实现一些多次分区的操作,这样既增加了开发难度,又可能导致程序难以维护。为了解决这一问题,我们提出了NDD编程模型,它可以很大程度上减轻程序员在数据分区方面的的开发困难。
NDD模型允许用户对某一数据集进行多次分区,最后形成一个多层的分区结构,如先按颜色分大类,再按形状分小类。除了对数据集进一步细分,NDD也支持对底层小类进行重合并,回归上一层分区结构。
基于NDD模型,我们实现了一个开源的编程框架Bigflow。它可以看作是现有分布式系统的一个通用优化层,能够提供许多自动优化策略。我们将Bigflow对接在Spark和Hadoop上,发现对多层分区的基准测试,性能分别提升了30%和50%,同时与Hadoop相比,代码量也有很大的降低。目前,Bigflow已被应用于百度的数据中心,日均处理数据为3PB量级。根据用户的反馈,Bigflow可以极大减轻他们的开发负担,且为应用提供了非常好的性能。
NDD在Spark、Hadoop等批处理分布式计算引擎上工作练好,我们目前在探索如何在Flink等流式数据引擎上使用相同的思路,在不降低性能的前提下为用户提供更加方便的多层分区模型。

关键词: 分布式计算, 编程模型, 优化策略

Abstract: As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed to improve the abstraction of distributed datasets and hide the details of parallelism. However, most of them follow the single-layer partitioning method, which limits developers to express a multi-level partitioning operation succinctly. To overcome the problem, we present the NDD (Nested Distributed Dataset) data model. It is a more compact and expressive extension of Spark RDD (Resilient Distributed Dataset), in order to remove the burden on developers to manually write the logic for multi-level partitioning cases. Base on the NDD model, we develop an open-source framework called Bigflow, which serves as an optimization layer over computation engines from most widely used processing frameworks. With the help of Bigflow, some advanced optimization techniques, which may only be applied by experienced programmers manually, are enabled automatically in a distributed data processing job. Currently, Bigflow is processing about 3 PB data volumes daily in the data-centers of Baidu. According to customer experience, it can significantly save code length and improve performance over the intuitive programming style.

Key words: distributed computing, programming model, optimization technique

[1] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107-113.
[2] Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark:Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, June 2010, Article No. 5.
[3] Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.15-28.
[4] Chambers C, Raniwala A, Perry F, Adams S, Henry R R, Bradshaw R, Weizenbaum N. FlumeJava:Easy, efficient data-parallel pipelines. In Proc. the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2010, pp.363-375.
[5] Meng X, Bradley J, Yavuz B et al. MLlib:Machine learning in Apache Spark. The Journal of Machine Learning Research, 2016, 17:Article No. 34.
[6] Parsian M. Data Algorithms:Recipes for Scaling up with Hadoop and Spark. O'Reilly Media Inc., 2015.
[7] Karau H, Warren R. High Performance Spark:Best Practices for Scaling and Optimizing Apache Spark (1st edition). O'Reilly Media Inc., 2017.
[8] Akidau T, Bradshaw R, Chambers C et al. The dataflow model:A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 2015, 8(12):1792-1803.
[9] Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad:Distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 44(3):59-72.
[10] Saha B, Shah H, Seth S, Vijayaraghavan G, Murthy A, Curino C. Apache tez:A unifying framework for modeling and building data processing applications. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.1357-1369.
[11] Gates A F, Natkovich O, Chopra S, Kamath P, Narayanamurthy S M, Olston C, Reed B, Srinivasan S, Srivastava U. Building a high-level dataflow system on top of MapReduce:The pig experience. Proceedings of the VLDB Endowment, 2009, 2(2):1414-1425.
[12] Thusoo A, Sarma J S, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive:A warehousing solution over a Map-Reduce framework. Proceedings of the VLDB Endowment, 2009, 2(2):1626-1629.
[13] Alexandrov A, Bergmann R, Ewen S et al. The stratosphere platform for big data analytics. The VLDB Journal, 2014, 23(6):939-964.
[14] Brown K J, Lee H, Rompf T, Sujeeth A K, de Sa C, Aberger C, Olukotun K. Have abstraction and eat performance, too:Optimized heterogeneous computing with parallel patterns. In Proc. the 2016 IEEE/ACM International Symposium on Code Generation and Optimization, March 2016, pp.194-205.
[15] Power R, Li J. Piccolo:Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Symposium on Operating Systems Design and Implementation, October 2010, pp.293-306.
[16] Gunarathne T, Zhang B, Wu T L, Qiu J. Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Generation Computer Systems, 2013, 29(4):1035-1048.
[17] Caneill M, de Palma N. Lambda-blocks:Data processing with topologies of blocks. In Proc. the 2018 IEEE International Congress on Big Data, July 2018, pp.9-16.
[1] Jun-Hua Fang, Peng-Peng Zhao, An Liu, Zhi-Xu Li, Lei Zhao. 分布式数据流中轨迹大数据的自适应连接方法[J]. 计算机科学技术学报, 2019, 34(4): 747-761.
[2] Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang. 基于节点分发的分布式Twig查询处理技术[J]. , 2017, 32(1): 78-92.
[3] Li Shen, Fan Xu, Zhi-Ying Wang. 软件线程级猜测系统中面向循环特征的优化策略[J]. , 2016, 31(1): 60-76.
[4] Dong-Hong Han, Xin Zhang, Guo-Ren Wang. 利用分布式极限学习机分类不确定演化数据流[J]. , 2015, 30(4): 874-887.
[5] Xiang-Ke Liao, Can-Qun Yang, Tao Tang Hui-Zhan Yi, Feng Wang, Qiang Wu, Jingling. OpenMC:简化天河超级计算机的编程[J]. , 2014, 29(3): 532-546.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[4] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[5] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[6] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[7] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[8] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[9] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[10] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: