Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (2): 453-467.doi: 10.1007/s11390-020-9702-3

• Special Section of ChinaSys 2019 • Previous Articles     Next Articles

Bigflow: A General Optimization Layer for Distributed Computing Frameworks

Yun-Cong Zhang1, Xiao-Yang Wang1,2,*, Cong Wang1, Yao Xu1, Jian-Wei Zhang1, Xiao-Dong Lin1, Guang-Yu Sun2, Member, CCF, IEEE, Gong-Lin Zheng1, Shan-Hui Yin1, Xian-Jin Ye1, Li Li1, Zhan Song1, Dong-Dong Miao1        

  1. 1 Baidu Inc., Beijing 100193, China;
    2 Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China
  • Received:2019-05-21 Revised:2020-01-17 Online:2020-03-05 Published:2020-03-18
  • Contact: Xiao-Yang Wang
  • About author:Yun-Cong Zhang received his B.S. degree from Nanyang Institute of Technology, Nanyang. He is now working in Baidu in the field of distributed computing and autonomous driving. He used to lead a team to build a unified presentation layer for distributed systems in the Department of Infrastructure. Now he is working in the Planning & Control team (PNC) in the Intelligence Driving Group.
  • Supported by:
    This work is supported by the National Key Research and Development Project of China under Grant No. 2018YFB1003304 and Beijing Academy of Artificial Intelligence (BAAI).

As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed to improve the abstraction of distributed datasets and hide the details of parallelism. However, most of them follow the single-layer partitioning method, which limits developers to express a multi-level partitioning operation succinctly. To overcome the problem, we present the NDD (Nested Distributed Dataset) data model. It is a more compact and expressive extension of Spark RDD (Resilient Distributed Dataset), in order to remove the burden on developers to manually write the logic for multi-level partitioning cases. Base on the NDD model, we develop an open-source framework called Bigflow, which serves as an optimization layer over computation engines from most widely used processing frameworks. With the help of Bigflow, some advanced optimization techniques, which may only be applied by experienced programmers manually, are enabled automatically in a distributed data processing job. Currently, Bigflow is processing about 3 PB data volumes daily in the data-centers of Baidu. According to customer experience, it can significantly save code length and improve performance over the intuitive programming style.

Key words: distributed computing; programming model; optimization technique;

[1] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107-113.
[2] Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark:Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, June 2010, Article No. 5.
[3] Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.15-28.
[4] Chambers C, Raniwala A, Perry F, Adams S, Henry R R, Bradshaw R, Weizenbaum N. FlumeJava:Easy, efficient data-parallel pipelines. In Proc. the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2010, pp.363-375.
[5] Meng X, Bradley J, Yavuz B et al. MLlib:Machine learning in Apache Spark. The Journal of Machine Learning Research, 2016, 17:Article No. 34.
[6] Parsian M. Data Algorithms:Recipes for Scaling up with Hadoop and Spark. O'Reilly Media Inc., 2015.
[7] Karau H, Warren R. High Performance Spark:Best Practices for Scaling and Optimizing Apache Spark (1st edition). O'Reilly Media Inc., 2017.
[8] Akidau T, Bradshaw R, Chambers C et al. The dataflow model:A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 2015, 8(12):1792-1803.
[9] Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad:Distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 44(3):59-72.
[10] Saha B, Shah H, Seth S, Vijayaraghavan G, Murthy A, Curino C. Apache tez:A unifying framework for modeling and building data processing applications. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.1357-1369.
[11] Gates A F, Natkovich O, Chopra S, Kamath P, Narayanamurthy S M, Olston C, Reed B, Srinivasan S, Srivastava U. Building a high-level dataflow system on top of MapReduce:The pig experience. Proceedings of the VLDB Endowment, 2009, 2(2):1414-1425.
[12] Thusoo A, Sarma J S, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive:A warehousing solution over a Map-Reduce framework. Proceedings of the VLDB Endowment, 2009, 2(2):1626-1629.
[13] Alexandrov A, Bergmann R, Ewen S et al. The stratosphere platform for big data analytics. The VLDB Journal, 2014, 23(6):939-964.
[14] Brown K J, Lee H, Rompf T, Sujeeth A K, de Sa C, Aberger C, Olukotun K. Have abstraction and eat performance, too:Optimized heterogeneous computing with parallel patterns. In Proc. the 2016 IEEE/ACM International Symposium on Code Generation and Optimization, March 2016, pp.194-205.
[15] Power R, Li J. Piccolo:Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Symposium on Operating Systems Design and Implementation, October 2010, pp.293-306.
[16] Gunarathne T, Zhang B, Wu T L, Qiu J. Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Generation Computer Systems, 2013, 29(4):1035-1048.
[17] Caneill M, de Palma N. Lambda-blocks:Data processing with topologies of blocks. In Proc. the 2018 IEEE International Congress on Big Data, July 2018, pp.9-16.
[1] Jun-Hua Fang, Peng-Peng Zhao, An Liu, Zhi-Xu Li, Lei Zhao. Scalable and Adaptive Joins for Trajectory Data in Distributed Stream System [J]. Journal of Computer Science and Technology, 2019, 34(4): 747-761.
[2] Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang. Efficient Processing of Distributed Twig Queries Based on Node Distribution [J]. , 2017, 32(1): 78-92.
[3] Li Shen, Fan Xu, Zhi-Ying Wang. Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems [J]. , 2016, 31(1): 60-76.
[4] Dong-Hong Han, Xin Zhang, Guo-Ren Wang. Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine [J]. , 2015, 30(4): 874-887.
[5] Xiang-Ke Liao, Can-Qun Yang, Tao Tang Hui-Zhan Yi, Feng Wang, Qiang Wu, and Jingling Xue. OpenMC:Towards Simplifying Programming for TianHe Supercomputers [J]. , 2014, 29(3): 532-546.
[6] Jin Huang, Feng Zhao, Jian Chen, Member, CCF, Jian Pei, Senior Member, ACM, IEEE, and Jian Yin, Senior Member, CCF. Towards Progressive and Load Balancing Distributed Computation: A Case Study on Skyline Analysis [J]. , 2010, 25(3): 431-443.
[7] JIN Hai (金 海), ZOU DeQing (邹德清), CHEN HanHua (陈汉华), SUN JianHua (孙建华) and WU Song (吴 松). Fault-Tolerant Grid Architecture and Practice [J]. , 2003, 18(4): 0-0.
[8] LAN Youran;. A Dynamic Load Balancing Mechanism for Distributed Systems [J]. , 1996, 11(3): 195-207.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[4] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[5] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[6] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[7] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[8] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[9] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[10] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved