We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Li JJ, Wang K, Zheng H et al. GShuttle: Optimizing memory access efficiency for graph convolutional neural network accelerators. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(1): 115−127 Jan. 2023. DOI: 10.1007/s11390-023-2875-9.
Citation: Li JJ, Wang K, Zheng H et al. GShuttle: Optimizing memory access efficiency for graph convolutional neural network accelerators. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(1): 115−127 Jan. 2023. DOI: 10.1007/s11390-023-2875-9.

GShuttle: Optimizing Memory Access Efficiency for Graph Convolutional Neural Network Accelerators

Funds: This work was supported by the U.S. National Science Foundation under Grant Nos. CCF-2131946, CCF-1953980, and CCF-1702980. Part of this work was conducted when Dr. Jia-Jun Li was a post-doctoral researcher at the HPCAT Laboratory, George Washington University.
More Information
  • Author Bio:

    Jia-Jun Li received his B.E. degree from the Department of Automation, Tsinghua University, Beijing, in 2013. He received his Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2019. From 2019 to 2021, he was a postdoc researcher with the Department of Electrical and Computer Engineering, George Washington University, Washington. He is currently an associate professor with the School of Astronautics, Beihang University, Beijing. His current research interests include machine learning and heterogeneous computer architecture

    Ke Wang received his Ph.D. degree in computer engineering from the George Washington University, Washington, in 2022. He received his M.S. degree in electrical engineering from Worcester Polytechnic Institute, Worcester, in 2015, and his B.S. degree in electrical engineering from Peking University, Beijing, in 2013. He is currently an assistant professor of the Department of Electrical and Computer Engineering at the University of North Carolina at Charlotte. His research work focuses on parallel computing, computer architecture, interconnection networks, and machine learning

    Hao Zheng received his Ph.D. degree in computer engineering from George Washington University, Washington, in 2021. He is an assistant professor in the Department of Electrical and Computer Engineering at the University of Central Florida, Orlando. His research interests are in the broad area of computer architecture and parallel computing, with emphasis on interconnection networks, AI chips for emerging applications, and energy-efficient manycore architecture designs

    Ahmed Louri is the David and Marilyn Karlgaard Endowed Chair Professor of the Department of Electrical and Computer Engineering at the George Washington University, Washington, which he joined in August 2015. He is also the director of the High Performance Computing Architectures and Technologies Laboratory. Dr. Louri received his Ph.D. degree in computer engineering from the University of Southern California, Los Angeles in 1988. From 1988 to 2015, he was a professor of Electrical and Computer Engineering at the University of Arizona, Tucson, and during that time, he served six years (2000 to 2006) as the chair of the Computer Engineering Program. From 2010 to 2013, Dr. Louri served as a program director in the National Science Foundation’s (NSF) Directorate for Computer and Information Science and Engineering. He directed the core computer architecture program and was on the management team of several cross-cutting programs. Dr. Louri conducts research in the broad area of computer architecture and parallel computing, with emphasis on interconnection networks, optical interconnects for scalable parallel computing systems, reconfigurable computing systems, and power-efficient and reliable Network-on-Chips (NoCs) for multicore architectures. Recently he has been concentrating on: energy-efficient, reliable, and high-performance many-core architectures; accelerator-rich reconfigurable heterogeneous architectures; machine learning techniques for efficient computing, memory, and interconnect systems; emerging interconnect technologies (photonic, wireless, RF, hybrid) for NoCs; future parallel computing models and architectures (including convolutional neural networks, deep neural networks, and approximate computing); and cloud-computing and data centers. He is the recipient of the 2020 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, for pioneering contributions to the solution of on-chip and off-chip communication problems for parallel computing and manycore architectures. Dr. Louri is a fellow of IEEE, and he is currently the Editor-in-Chief of the IEEE Transactions on Computers. More information can be found at https://hpcat.seas.gwu.edu/Director.html

  • Received Date: September 28, 2022
  • Revised Date: October 27, 2022
  • Accepted Date: December 31, 2022
  • Graph convolutional neural networks (GCNs) have emerged as an effective approach to extending deep learning for graph data analytics, but they are computationally challenging given the irregular graphs and the large number of nodes in a graph. GCNs involve chain sparse-dense matrix multiplications with six loops, which results in a large design space for GCN accelerators. Prior work on GCN acceleration either employs limited loop optimization techniques, or determines the design variables based on random sampling, which can hardly exploit data reuse efficiently, thus degrading system efficiency. To overcome this limitation, this paper proposes GShuttle, a GCN acceleration scheme that maximizes memory access efficiency to achieve high performance and energy efficiency. GShuttle systematically explores loop optimization techniques for GCN acceleration, and quantitatively analyzes the design objectives (e.g., required DRAM accesses and SRAM accesses) by analytical calculation based on multiple design variables. GShuttle further employs two approaches, pruned search space sweeping and greedy search, to find the optimal design variables under certain design constraints. We demonstrated the efficacy of GShuttle by evaluation on five widely used graph datasets. The experimental simulations show that GShuttle reduces the number of DRAM accesses by a factor of 1.5 and saves energy by a factor of 1.7 compared with the state-of-the-art approaches.

  • [1]
    Jiang W W, Luo J Y. Graph neural network for traffic forecasting: A survey. arXiv: 2101.11174, 2021. https://arxiv.org/abs/2101.11174, Dec. 2022.
    [2]
    Shi W J, Rajkumar R. Point-GNN: Graph neural network for 3D object detection in a point cloud. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1711–1719. DOI: 10.1109/CVPR42600.2020.00178.
    [3]
    Wee C Y, Liu C Q, Lee A, Poh J S, Ji H, Qiu A Q, The Alzheimers Disease Neuroimage Initiative. Cortical graph neural network for AD and MCI diagnosis and transfer learning across populations. NeuroImage: Clinical, 2019, 23: 101929. DOI: 10.1016/j.nicl.2019.101929.
    [4]
    Zhang Z W, Cui P, Zhu W W. Deep learning on graphs: A survey. IEEE Trans. Knowledge and Data Engineering, 2022, 34(1): 249–270. DOI: 10.1109/TKDE.2020.2981333.
    [5]
    Yang H X. AliGraph: A comprehensive graph neural network platform. In Proc. the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Jul. 2019, pp.3165–3166. DOI: 10.1145/3292500.3340404.
    [6]
    Lerer A, Wu L, Shen J, Lacroix T, Wehrstedt L, Bose A, Peysakhovich A. PyTorch-BigGraph: A large-scale graph embedding system. arXiv: 1903.12287, 2019. https://arxiv.org/abs/1903.12287, Dec. 2022.
    [7]
    Yan M Y, Chen Z D, Deng L, Ye X C, Zhang Z M, Fan D R, Xie Y. Characterizing and understanding GCNs on GPU. IEEE Computer Architecture Letters, 2020, 19(1): 22–25. DOI: 10.1109/LCA.2020.2970395.
    [8]
    Zhang Z H, Leng J W, Ma L X, Miao Y S, Li C, Guo M Y. Architectural implications of graph neural networks. IEEE Computer Architecture Letters, 2020, 19(1): 59–62. DOI: 10.1109/LCA.2020.2988991.
    [9]
    Geng T, Li A, Shi R B, Wu C S, Wang T Q, Li Y F, Haghi P, Tumeo A, Che S, Reinhardt S, Herbordt M C. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2020, pp.922–936. DOI: 10.1109/MICRO50266.2020.00079.
    [10]
    Ma Y F, Cao Y, Vrudhula S, Seo J S. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proc. the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp.45–54. DOI: 10.1145/3020078.3021736.
    [11]
    Yan M Y, Deng L, Hu X, Liang L, Feng, Y J, Ye X C, Zhang Z M, Fan D R, Xie Y. HyGCN: A GCN accelerator with hybrid architecture. In Proc. the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2020, pp.15–29. DOI: 10.1109/HPCA47549.2020.00012.
    [12]
    Li J J, Louri A, Karanth A, Bunescu R. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In Proc. the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Mar. 2021, pp.775–788. DOI: 10.1109/HPCA51647.2021.00070.
    [13]
    Galal S, Horowitz M. Energy-efficient floating-point unit design. IEEE Transactions on Computers, 2011, 60(7): 913-922.
    [14]
    Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv: 1609.02907, 2016. https://arxiv.org/abs/1609.02907, Dec. 2022.
    [15]
    Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.1024–1034.
    [16]
    Xu K, Hu W H, Leskovec J, Jegelka S. How powerful are graph neural networks? arXiv: 1810.00826, 2018. https://arxiv.org/abs/1810.00826, Dec. 2022.
    [17]
    Allen J R, Kennedy K. Automatic loop interchange. In Proc. the 1984 SIGPLAN Symposium on Compiler Construction, Jun. 1984, pp.233–246. DOI: 10.1145/502874.502897.
    [18]
    Zhang C, Li P, Sun G Y et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2015, pp.161–170. DOI: 10.1145/2684746.2689060.
    [19]
    Pugh W. Uniform techniques for loop optimization. In Proc. the 5th International Conference on Supercomputing, Jun. 1991, pp.341–352. DOI: 10.1145/109025.109108.
    [20]
    Pal S, Beaumont J, Park D H, Amarnath A, Feng S Y, Chakrabarti C, Kim H S, Blaauw D, Mudge T, Dreslinski R. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In Proc. the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2018, pp.724–736. DOI: 10.1109/HPCA.2018.00067.
    [21]
    Nie J. Memory-driven data-flow optimization for neural processing accelerators [Ph.D. Thesis]. Princeton University, 2020. https://www.proquest.com/openview/41fe23f43fd65cafaa8c2e051aed4059/1?pq-origsite=gscholar&cbl=18750&diss=y, Jan. 2023.
    [22]
    Sen P, Namata G, Bilgic M et al. Collective classification in network data. AI Magazine, 2008, 29(3): 93-106. DOI: 10.1609/aimag.v29i3.2157.
    [23]
    Carlson A, Betteridge J, Kisiel B et al. Toward an architecture for never-ending language learning. In Proc. the 34th AAAI Conference on Artificial Intelligence, July 2010, pp.1306-1313.
    [24]
    Auten A, Tomei M, Kumar R. Hardware acceleration of graph neural networks. In Proc. the 57th ACM/IEEE Design Automation Conference (DAC), Jul. 2020. DOI: 10.1109/DAC18072.2020.9218751.
    [25]
    Liang S W, Wang Y, Liu C et al. EnGN: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Computers, 2021, 70(9): 1511–1525. DOI: 10.1109/TC.2020.3014632.
    [26]
    Kiningham K, Re C, Levis P. GRIP: A graph neural network accelerator architecture. arXiv: 2007.13828, 2020. https://arxiv.org/abs/2007.13828v1, Dec. 2022.
    [27]
    Zeng H Q, Prasanna V. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In Proc. the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2020, pp.255–265. DOI: 10.1145/3373087.3375312.
    [28]
    Shi F, Jin A Y, Zhu S C. VersaGNN: A versatile accelerator for graph neural networks. arXiv: 2105.01280, 2021. https://arxiv.org/abs/2105.01280, Dec. 2022.
  • Related Articles

    [1]Yong-Duo Sui, Xiang Wang, Tianlong Chen, Meng Wang, Xiang-Nan He, Tat-Seng Chua. Inductive Lottery Ticket Learning for Graph Neural Networks[J]. Journal of Computer Science and Technology, 2024, 39(6): 1223-1237. DOI: 10.1007/s11390-023-2583-5
    [2]Yi-Min Zhuang, Xing Hu, Xiao-Bing Chen, Tian Zhi. DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining[J]. Journal of Computer Science and Technology, 2023, 38(4): 899-910. DOI: 10.1007/s11390-021-1161-y
    [3]Shao-Feng Zhao, Fang Wang, Bo Liu, Dan Feng, Yang Liu. LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization[J]. Journal of Computer Science and Technology, 2023, 38(2): 328-347. DOI: 10.1007/s11390-023-2545-y
    [4]Xiao-Bing Chen, Hao Qi, Shao-Hui Peng, Yi-Min Zhuang, Tian Zhi, Yun-Ji Chen. Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators[J]. Journal of Computer Science and Technology, 2022, 37(6): 1255-1270. DOI: 10.1007/s11390-021-1213-3
    [5]HUANG Deshuang. The "Bottleneck" Behaviours in Linear Feedforward Neural Network Classifiers and Their Breakthrough[J]. Journal of Computer Science and Technology, 1999, 14(1): 34-43.
    [6]Zhou Jingzhou. A Neural Network Model Based on Logical Operations[J]. Journal of Computer Science and Technology, 1998, 13(5): 464-470.
    [7]Qin Kaihuai. Neural Network Methods for NURBS Curve and Surface Interpolation[J]. Journal of Computer Science and Technology, 1997, 12(1): 76-89.
    [8]Chen Ke, Bao Weiquan, Chi Huisheng. Speed up Training of the Recurrent Neural Network Based on Constrained optimization Techniques[J]. Journal of Computer Science and Technology, 1996, 11(6): 581-588.
    [9]Wei Naihong, Yang Shiyuan, Tong Shibai. A Neural Network Appraoch to Fault Diagnosis in Analog Circuits[J]. Journal of Computer Science and Technology, 1996, 11(6): 542-550.
    [10]Zhang Zhong. Simulation of ATPG Neural Network and Its Experimental Results[J]. Journal of Computer Science and Technology, 1995, 10(4): 310-324.
  • Cited by

    Periodical cited type(2)

    1. Anqin Zhang, Yan Zhao, Chenhao Zhou, et al. ResACAG: A graph neural network based intrusion detection. Computers and Electrical Engineering, 2025, 122: 109956. DOI:10.1016/j.compeleceng.2024.109956
    2. Zeang Sheng, Wentao Zhang, Yangyu Tao, et al. OUTRE: An OUT-of-Core De-REdundancy GNN Training Framework for Massive Graphs within A Single Machine. Proceedings of the VLDB Endowment, 2024, 17(11): 2960. DOI:10.14778/3681954.3681976

    Other cited types(0)

Catalog

    Article views (341) PDF downloads (36) Cited by(2)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return