|
Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1051-1070.doi: 10.1007/s11390-021-1242-y
Special Issue: Computer Architecture and Systems; Computer Networks and Distributed Computing
• Special Section of APPT 2021 (Part 1) • Previous Articles Next Articles
Yu-Wei Wu, Student Member, CCF, Qing-Gang Wang, Student Member, CCF Long Zheng*, Member, CCF, ACM, IEEE, Xiao-Fei Liao, Senior Member, CCF, Member, IEEE Hai Jin, Fellow, CCF, IEEE, Member, ACM, Wen-Bin Jiang, Member, CCF, ACM, IEEE Ran Zheng, Member, CCF, ACM, IEEE, and Kan Hu, Member, CCF, ACM, IEEE
[1] Quick L, Wilkinson P, Hardcastle D. Using Pregel-like large scale graph processing frameworks for social network analysis. In Proc. the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Aug. 2012, pp.457-463. DOI:10.1109/ASONAM.2012.254. [2] Aridhi S, Montresor A, Velegrakis Y. BLADYG:A novel block-centric framework for the analysis of large dynamic graphs. In Proc. the ACM Workshop on High Performance Graph Processing, May 2016, pp.39-42. DOI:10.1016/j.bdr.2017.05.003. [3] Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens J D. Gunrock:A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Mar. 2016, Article No. 11. DOI:10.1145/2851141.2851145. [4] Warnke-Sommer J, Ali H. Graph mining for next generation sequencing:Leveraging the assembly graph for biological insights. BMC Genomics, 2016, 17(1):Article No. 340. DOI:10.1186/s12864-016-2678-2. [5] Dai G, Chi Y, Wang Y, Yang H. FPGP:Graph processing framework on FPGA-A case study of breadth-first search. In Proc. the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2016, pp.105-110. DOI:10.1145/2847263.2847339. [6] Engelhardt N, So H K H. GraVF:A vertex-centric distributed graph processing framework on FPGAs. In Proc. the 26th International Conference on Field Programmable Logic and Applications, Aug. 29-Sept. 2, 2016. DOI:10.1109/FPL.2016.7577360. [7] Nurvitadhi E, Weisz G, Wang Y, Hurkat S, Nguyen M, Hoe J C, Martínez J, Guestrin C. GraphGen:An FPGA framework for vertex-centric graph computation. In Proc. the 22nd IEEE Annual International Symposium on FieldProgrammable Custom Computing Machines, May 2014, pp.25-28. DOI:10.1109/FCCM.2014.15. [8] Oguntebi T, Olukotun K. GraphOps:A dataflow library for graph analytics acceleration. In Proc. the 2016 ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, Feb. 2016, pp.111-117. DOI:10.1145/2847263.2847337. [9] Zhou S, Chelmis C, Prasanna V K. High-throughput and energy-efficient graph processing on FPGA. In Proc. the 24th IEEE Annual International Symposium on FieldProgrammable Custom Computing Machines, May 2016, pp.103-110. DOI:10.1109/FCCM.2016.35. [10] Yao P, Zheng L, Liao X, Jin H, He B. An efficient graph accelerator with parallel data conflict management. In Proc. the 27th International Conference on Parallel Architectures and Compilation Techniques, Nov. 2018, Article No. 8. DOI:10.1145/3243176.3243201. [11] Yang C, Zheng L, Gui C, Jin H. Efficient FPGA-based graph processing with hybrid pull-push computational model. Frontiers Comput. Sci., 2020, 14(4):Article No. 144102. DOI:10.1007/s11704-019-9020-5. [12] Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q. An effective framework for asynchronous incremental graph processing. Frontiers Comput. Sci., 2019, 13(3):539-551. DOI:10.1007/s11704-018-7443-z. [13] Jin H, Yao P, Liao X. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12):Article No. 126102. DOI:10.1007/s11432-017-9226-8. [14] Li Z, Ding Z. Distributed optimization on unbalanced graphs via continuous-time methods. Science China Information Sciences, 2018, 61(12):Article No. 129204. DOI:10.1007/s11432-018-9502-1. [15] Ahn J, Hong S, Yoo S, Mutlu O, Choi K. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. the 42nd Annual International Symposium on Computer Architecture, Jun. 2015, pp.105-117. DOI:10.1145/2749469.2750386. [16] McSherry F, Isard M, Murray D G. Scalability! But at what COST? In Proc. the 15th USENIX Conference on Hot Topics in Operating Systems, May 2015. [17] Dai G, Huang T, Chi Y, Xu N, Wang Y, Yang H. ForeGraph:Exploring large-scale graph processing on multiFPGA architecture. In Proc. the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp.217-226. DOI:0.1145/3020078.3021739. [18] Dathathri R, Gill G, Hoang L, Dang H, Brooks A, Dryden N, Snir M, Pingali K. Gluon:A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proc. the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2018, pp.752-768. DOI:10.1145/3192366.3192404. [19] Satish N, Sundaram N, Patwary M M A, Seo J, Park J, Hassaan M A, Sengupta S, Yin Z, Dubey P. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.979-990. DOI:10.1145/2588555.2610518. [20] Khorasani F, Gupta R, Bhuyan L N. Scalable SIMDefficient graph processing on GPUs. In Proc. the 2015 International Conference on Parallel Architectures and Compilation Techniques, Oct. 2015, pp.39-50. DOI:10.1109/PACT.2015.15. [21] Fu H, Liao J, Yang J et al. The Sunway TaihuLight supercomputer:System and applications. Science China Information Sciences, 2016, 59(7):Article No. 072001. DOI:10.1007/s11432-016-5588-7. [22] Zhang F, Zheng L, Liao X, Lv X, Jin H, Xiao J. An effective 2-dimension graph partitioning for work stealing assisted graph processing on multi-FPGAs. IEEE Transactions on Big Data, DOI:10.1109/TBDATA.2020.3035090. [23] Engelhardt N, So H K H. GraVF-M:Graph processing system generation for multi-FPGA platforms. ACM Transactions on Reconfigurable Technology and Systems, 2019, 12(4):Article No. 21. DOI:10.1145/3357596. [24] Shao Z, Li R, Hu D, Liao X, Jin H. Improving performance of graph processing on FPGA-DRAM platform by twolevel vertex caching. In Proc. the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2019, pp.320-329. DOI:10.1145/3289602.3293900. [25] Zhou S, Kannan R, Prasanna V K, Seetharaman G, Wu Q. HitGraph:High-throughput graph processing framework on FPGA. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(10):2249-2264. DOI:10.1109/TPDS.2019.2910068. [26] Wang Q, Zheng L, Zhao J, Liao X, Jin H, Xue J. A conflictfree scheduler for high-performance graph processing on multi-pipeline FPGAs. ACM Transactions on Architecture and Code Optimization, 2020, 17(2):Article No. 14. DOI:10.1145/3390523. [27] Putnam A, Caulfield A M, Chung E S et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Proc. the 41st ACM/IEEE International Symposium on Computer Architecture, Jun. 2014, pp.13-24. DOI:10.110-9/ISCA.2014.6853195. [28] Caulfield A M, Chung E S, Putnam A et al. Configurable clouds. IEEE Micro, 2017, 37(3):52-61. DOI:10.1109/MM.2017.51. [29] Zhou S, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proc. the 29th International Symposium on Computer Architecture and High Performance Computing, Oct. 2017, pp.137-144. DOI:10.1109/SBAC-PAD.2017.25. [30] Zhu X, Chen W, Zheng W, Ma X. Gemini:A computationcentric distributed graph processing system. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.301-316. [31] Malewicz G, Austern M H, Bik A, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel:A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.135-146. DOI:10.1145/1807167.1807184. [32] Eskandari N, Tarafdar N, Ly-Ma D, Chow P. A modular heterogeneous stack for deploying FPGAs and CPUs in the data center. In Proc. the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2019, pp.262-271. DOI:10.1145/3289602.3293909. [33] Gill G, Dathathri R, Hoang L, Pingali K. A study of partitioning policies for graph analytics on large-scale distributed platforms. Proceedings of the VLDB Endowment, 2018, 12(4):321-334. DOI:10.14778/3297753.3297754. [34] Zilberman N, Bracha G, Schzukin G. Stardust:Divide and conquer in the data center network. In Proc. the 16th USENIX Symposium on Networked Systems Design and Implementation, Feb. 2019, pp.141-160. [35] Snir M, Gropp W, Otto S, Huss-Lederman S, Dongarra J, Walker D. MPI-The Complete Reference:Volume 1, the MPI Core (2nd edition). MIT Publishers, 1998. [36] Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph:Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17-30. [37] Boman E G, Devine K D, Rajamanickam S. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013. DOI:10.1145/2503210.2503293. [38] Slota G M, Rajamanickam S, Devine K D, Madduri K. Partitioning trillion-edge graphs in minutes. In Proc. the 2017 IEEE International Parallel and Distributed Processing Symposium, May 29-June 2, 2017, pp.646-655. DOI:10.1109/IPDPS.2017.95. [39] Chen R, Shi J, Chen Y, Zang B, Guan H, Chen H. PowerLyra:Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing, 2018, 5(3):Article No. 13. DOI:10.1145/3298989. [40] Leskovec J, Krevl A. SNAP datasets:Stanford large network dataset collection, 2014. https://snap.stanford.edu/data, May 2021. [41] Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. GraphX:Graph processing in a distributed dataflow framework. In Proc. the 11th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2014, pp.599-613. [42] Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph:An efficient graph processing system on a single machine. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.409-420. DOI:10.1109/ICDE.2016.7498258. [43] Zhu X, Han W, Chen W. GridGraph:Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proc. the 2015 USENIX Annual Technical Conference, Jul. 2015, pp.375-386. [44] Roy A, Mihailovic I, Zwaenepoel W. X-Stream:Edgecentric graph processing using streaming partitions. In Proc. the 24th ACM Symposium on Operating Systems Principles, Nov. 2013, pp.472-488. DOI:10.1145/2517-349.2522740. [45] Shun J, Blelloch G E. Ligra:A lightweight graph processing framework for shared memory. In Proc. the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2013, pp.135-146. DOI:10.1145/2442516.2442530. [46] DeLorimier M, Kapre N, Mehta N, Rizzo D, Eslick I, Rubin R, Uribe T E, Knight T F, DeHon A. GraphStep:A system architecture for sparse-graph algorithms. In Proc. the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Apr. 2006, pp.143-151. DOI:10.1109/FCCM.2006.45. [47] Betkaoui B,Wang Y, Thomas D B, Luk W. Parallel FPGAbased all pairs shortest paths for sparse networks:A human brain connectome case study. In Proc. the 22nd International Conference on Field Programmable Logic and Applications, Aug. 2012, pp.99-104. DOI:10.1109/FPL.2012.6339247. |
[1] | Jia-Jun Li, Ke Wang, Hao Zheng, and Ahmed Louri. GShuttle: Optimizing Memory Access Efficiency for Graph Convolutional Neural Network Accelerators [J]. Journal of Computer Science and Technology, 2023, 38(1): 115-127. |
[2] | Xiao-Bing Chen, Hao Qi, Shao-Hui Peng, Yi-Min Zhuang, Tian Zhi, and Yun-Ji Chen. Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators [J]. Journal of Computer Science and Technology, 2022, 37(6): 1255-1270. |
[3] | Lei Liu, Xiu Ma, Hua-Xiao Liu, Guang-Li Li, and Lei Liu. FlexPDA: A Flexible Programming Framework for Deep Learning Accelerators [J]. Journal of Computer Science and Technology, 2022, 37(5): 1200-1220. |
[4] | Yubin Duan, Ning Wang, and Jie Wu. Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling [J]. Journal of Computer Science and Technology, 2022, 37(4): 852-868. |
[5] | Xue-Qi Li, Guang-Ming Tan, Ning-Hui Sun. PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm [J]. Journal of Computer Science and Technology, 2021, 36(1): 56-70. |
[6] | Gui-Juan Wang, Cheng-Kuan Lin, Jian-Xi Fan, Jing-Ya Zhou, Bao-Lei Cheng. Fault-Tolerant Hamiltonicity and Hamiltonian Connectivity of BCube with Various Faulty Elements [J]. Journal of Computer Science and Technology, 2020, 35(5): 1064-1083. |
[7] | Shu-Quan Wang, Lei Wang, Yu Deng, Zhi-Jie Yang, Sha-Sha Guo, Zi-Yang Kang, Yu-Feng Guo, Wei-Xia Xu. SIES: A Novel Implementation of Spiking Convolutional Neural Network Inference Engine on Field-Programmable Gate Array [J]. Journal of Computer Science and Technology, 2020, 35(2): 475-489. |
[8] | Peng Zhao, Chen Ding, Lei Liu, Jiping Yu, Wentao Han, Xiao-Bing Feng. Cacheap: Portable and Collaborative I/O Optimization for Graph Processing [J]. Journal of Computer Science and Technology, 2019, 34(3): 690-706. |
[9] | Chuang-Yi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xin-Yu Chen, Xiao-Fei Liao, Hai Jin. A Survey on Graph Processing Accelerators: Challenges and Opportunities [J]. Journal of Computer Science and Technology, 2019, 34(2): 339-371. |
[10] | Shuai-Bing Lu, Jie Wu, Huan-Yang Zheng, Zhi-Yi Fang. On Maximum Elastic Scheduling in Cloud-Based Data Center Networks for Virtual Machines with the Hose Model [J]. Journal of Computer Science and Technology, 2019, 34(1): 185-206. |
[11] | Xi Wang, Jian-Xi Fan, Cheng-Kuan Lin, Jing-Ya Zhou, Zhao Liu. BCDC: A High-Performance, Server-Centric Data Center Network [J]. , 2018, 33(2): 400-416. |
[12] | Fa-Qiang Sun, Gui-Hai Yan, Xin He, Hua-Wei Li, Yin-He Han. CPicker: Leveraging Performance-Equivalent Configurations to Improve Data Center Energy Efficiency [J]. , 2018, 33(1): 131-144. |
[13] | Yi-Hong Gao, Hua-Dong Ma, Wu Liu. Minimizing Resource Cost for Camera Stream Scheduling in Video Data Center [J]. , 2017, 32(3): 555-570. |
[14] | Chao Li, Rui Wang, Yang Hu, Ruijin Zhou, Ming Liu, Long-Jun Liu, Jing-Ling Yuan, Tao Li, and De-Pei Qian. Towards Automated Provisioning and Emergency Handling in Renewable Energy Powered Datacenters [J]. , 2014, 29(4): 618-630. |
[15] | Peng Chen, Lei Zhang, Yin-He Han, Yun-Ji Chen. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications [J]. , 2014, 29(2): 239-246. |
|
|