|
Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (2): 453-467.doi: 10.1007/s11390-020-9702-3
• Special Section of ChinaSys 2019 • Previous Articles Next Articles
Yun-Cong Zhang1, Xiao-Yang Wang1,2,*, Cong Wang1, Yao Xu1, Jian-Wei Zhang1, Xiao-Dong Lin1, Guang-Yu Sun2, Member, CCF, IEEE, Gong-Lin Zheng1, Shan-Hui Yin1, Xian-Jin Ye1, Li Li1, Zhan Song1, Dong-Dong Miao1
[1] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107-113. [2] Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark:Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, June 2010, Article No. 5. [3] Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.15-28. [4] Chambers C, Raniwala A, Perry F, Adams S, Henry R R, Bradshaw R, Weizenbaum N. FlumeJava:Easy, efficient data-parallel pipelines. In Proc. the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2010, pp.363-375. [5] Meng X, Bradley J, Yavuz B et al. MLlib:Machine learning in Apache Spark. The Journal of Machine Learning Research, 2016, 17:Article No. 34. [6] Parsian M. Data Algorithms:Recipes for Scaling up with Hadoop and Spark. O'Reilly Media Inc., 2015. [7] Karau H, Warren R. High Performance Spark:Best Practices for Scaling and Optimizing Apache Spark (1st edition). O'Reilly Media Inc., 2017. [8] Akidau T, Bradshaw R, Chambers C et al. The dataflow model:A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 2015, 8(12):1792-1803. [9] Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad:Distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 44(3):59-72. [10] Saha B, Shah H, Seth S, Vijayaraghavan G, Murthy A, Curino C. Apache tez:A unifying framework for modeling and building data processing applications. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.1357-1369. [11] Gates A F, Natkovich O, Chopra S, Kamath P, Narayanamurthy S M, Olston C, Reed B, Srinivasan S, Srivastava U. Building a high-level dataflow system on top of MapReduce:The pig experience. Proceedings of the VLDB Endowment, 2009, 2(2):1414-1425. [12] Thusoo A, Sarma J S, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive:A warehousing solution over a Map-Reduce framework. Proceedings of the VLDB Endowment, 2009, 2(2):1626-1629. [13] Alexandrov A, Bergmann R, Ewen S et al. The stratosphere platform for big data analytics. The VLDB Journal, 2014, 23(6):939-964. [14] Brown K J, Lee H, Rompf T, Sujeeth A K, de Sa C, Aberger C, Olukotun K. Have abstraction and eat performance, too:Optimized heterogeneous computing with parallel patterns. In Proc. the 2016 IEEE/ACM International Symposium on Code Generation and Optimization, March 2016, pp.194-205. [15] Power R, Li J. Piccolo:Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Symposium on Operating Systems Design and Implementation, October 2010, pp.293-306. [16] Gunarathne T, Zhang B, Wu T L, Qiu J. Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Generation Computer Systems, 2013, 29(4):1035-1048. [17] Caneill M, de Palma N. Lambda-blocks:Data processing with topologies of blocks. In Proc. the 2018 IEEE International Congress on Big Data, July 2018, pp.9-16. |
[1] | Jun-Hua Fang, Peng-Peng Zhao, An Liu, Zhi-Xu Li, Lei Zhao. Scalable and Adaptive Joins for Trajectory Data in Distributed Stream System [J]. Journal of Computer Science and Technology, 2019, 34(4): 747-761. |
[2] | Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang. Efficient Processing of Distributed Twig Queries Based on Node Distribution [J]. , 2017, 32(1): 78-92. |
[3] | Li Shen, Fan Xu, Zhi-Ying Wang. Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems [J]. , 2016, 31(1): 60-76. |
[4] | Dong-Hong Han, Xin Zhang, Guo-Ren Wang. Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine [J]. , 2015, 30(4): 874-887. |
[5] | Xiang-Ke Liao, Can-Qun Yang, Tao Tang Hui-Zhan Yi, Feng Wang, Qiang Wu, and Jingling Xue. OpenMC:Towards Simplifying Programming for TianHe Supercomputers [J]. , 2014, 29(3): 532-546. |
[6] | Jin Huang, Feng Zhao, Jian Chen, Member, CCF, Jian Pei, Senior Member, ACM, IEEE, and Jian Yin, Senior Member, CCF. Towards Progressive and Load Balancing Distributed Computation: A Case Study on Skyline Analysis [J]. , 2010, 25(3): 431-443. |
[7] | JIN Hai (金 海), ZOU DeQing (邹德清), CHEN HanHua (陈汉华), SUN JianHua (孙建华) and WU Song (吴 松). Fault-Tolerant Grid Architecture and Practice [J]. , 2003, 18(4): 0-0. |
[8] | LAN Youran;. A Dynamic Load Balancing Mechanism for Distributed Systems [J]. , 1996, 11(3): 195-207. |
|