|
计算机科学技术学报 ›› 2023,Vol. 38 ›› Issue (1): 115-127.doi: 10.1007/s11390-023-2875-9
所属专题: Computer Architecture and Systems
Jia-Jun Li1 (李家军), Ke Wang2 (王可), Hao Zheng3 (郑皓), and Ahmed Louri4, Fellow, IEEE
1、研究背景
图卷积神经网络是将深度学习应用到图领域的一种重要方法。但图中往往存在大量节点并且节点之间在连接上存在显著差异性,导致底层计算平台往往无法高效的处理图卷积神经网络。最近,研究人员提出了专用的图卷积神经网络加速器来提升处理图卷积神经网络的性能和能效。然而,他们都没有系统地研究循环优化技术在访存方面对加速器能效的影响。这些工作只采用了有限的循环优化技术,几乎无法有效地利用数据重用,从而导致访存增加。由于访存能耗远远大于计算能耗,因此会大幅降低加速器能效。
本工作的研究目标是通过建立图卷积神经网络加速框架,系统地探索图卷积神经网络的访存优化技术,最大限度地提高访存效率,从而提高图卷积神经网络加速器的能效。
2、研究方法
本工作提出名为GShuttle的图卷积神经网络访存优化框架,GShuttle通过两种算法以在给定约束条件下为加速器确定最佳的设计参数,这两种算法分别为贪心算法(GShuttle-GS)和搜索空间剪枝法(GShuttle-PSSS)。本工作设计了基于C++的加速器模拟器来评估所提方法在减少访存方面的效果,并在五个图数据集上进行了验证。
3、实验和结果
实验结果表明,GShuttle相比于已有工作,片外访存效率提升高达 70%。
4、结论和展望
图卷积神经网络加速器的大部分能耗用于访存中,因此提升访存效率是提高加速器能效的关键。本文提出名为GShuttle的图卷积神经网络加速框架,对图神经网络加速器访存优化问题进行了形式化的定义,并提出两种算法来对这一优化问题进行求解。所提出的两种方法都能快速高效地搜索到图神经网络加速器较佳的设计参数。实验结果表明,GShuttle能够显著提升加速器的访存效率。并且,因为所提访存优化方法与已有的计算优化方法不存在冲突,GShuttle有望用于其他的图神经网络加速器,如HyGCN、AWB-GCN中以提升这些加速器的访存效率。
<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Jiang W W, Luo J Y. Graph neural network for traffic forecasting: A survey. arXiv: 2101.11174, 2021. <a href="https://arxiv.org/abs/2101.11174">https://arxiv.org/abs/2101.11174</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Shi W J, Rajkumar R. Point-GNN: Graph neural network for 3D object detection in a point cloud. In <i>Proc</i>. <i>the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, Jun. 2020, pp.1711–1719. DOI: <a href="http://dx.doi.org/10.1109/CVPR42600.2020.00178">10.1109/CVPR42600.2020.00178</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Wee C Y, Liu C Q, Lee A, Poh J S, Ji H, Qiu A Q, The Alzheimers Disease Neuroimage Initiative. Cortical graph neural network for AD and MCI diagnosis and transfer learning across populations. <i>NeuroImage</i>: <i>Clinical</i>, 2019, 23: 101929. DOI: <a href="http://dx.doi.org/10.1016/j.nicl.2019.101929">10.1016/j.nicl.2019.101929</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zhang Z W, Cui P, Zhu W W. Deep learning on graphs: A survey. <i>IEEE Trans</i>. <i>Knowledge and Data Engineering</i>, 2022, 34(1): 249–270. DOI: <a href="http://dx.doi.org/10.1109/TKDE.2020.2981333">10.1109/TKDE.2020.2981333</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Yang H X. AliGraph: A comprehensive graph neural network platform. In <i>Proc</i>. <i>the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining</i>, Jul. 2019, pp.3165–3166. DOI: <a href="http://dx.doi.org/10.1145/3292500.3340404">10.1145/3292500.3340404</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lerer A, Wu L, Shen J, Lacroix T, Wehrstedt L, Bose A, Peysakhovich A. PyTorch-BigGraph: A large-scale graph embedding system. arXiv: 1903.12287, 2019. <a href="https://arxiv.org/abs/1903.12287">https://arxiv.org/abs/1903.12287</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Yan M Y, Chen Z D, Deng L, Ye X C, Zhang Z M, Fan D R, Xie Y. Characterizing and understanding GCNs on GPU. <i>IEEE Computer Architecture Letters</i>, 2020, 19(1): 22–25. DOI: <a href="http://dx.doi.org/10.1109/LCA.2020.2970395">10.1109/LCA.2020.2970395</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zhang Z H, Leng J W, Ma L X, Miao Y S, Li C, Guo M Y. Architectural implications of graph neural networks. <i>IEEE Computer Architecture Letters</i>, 2020, 19(1): 59–62. DOI: <a href="http://dx.doi.org/10.1109/LCA.2020.2988991">10.1109/LCA.2020.2988991</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Geng T, Li A, Shi R B, Wu C S, Wang T Q, Li Y F, Haghi P, Tumeo A, Che S, Reinhardt S, Herbordt M C. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In <i>Proc</i>. <i>the 53rd Annual IEEE/ACM International Symposium on Microarchitecture</i> (<i>MICRO</i>), Oct. 2020, pp.922–936. DOI: <a href="http://dx.doi.org/10.1109/MICRO50266.2020.00079">10.1109/MICRO50266.2020.00079</a>. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ma Y F, Cao Y, Vrudhula S, Seo J S. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In <i>Proc</i>. <i>the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays</i>, Feb. 2017, pp.45–54. DOI: <a href="http://dx.doi.org/10.1145/3020078.3021736">10.1145/3020078.3021736</a>. </div> </td> </tr> <tr class="document-box" id="b11"> <td valign="top" class="td1"> [11] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Yan M Y, Deng L, Hu X, Liang L, Feng, Y J, Ye X C, Zhang Z M, Fan D R, Xie Y. HyGCN: A GCN accelerator with hybrid architecture. In <i>Proc</i>. <i>the</i> <i>2020 IEEE International Symposium on High Performance Computer Architecture</i> (<i>HPCA</i>), Feb. 2020, pp.15–29. DOI: <a href="http://dx.doi.org/10.1109/HPCA47549.2020.00012">10.1109/HPCA47549.2020.00012</a>. </div> </td> </tr> <tr class="document-box" id="b12"> <td valign="top" class="td1"> [12] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Li J J, Louri A, Karanth A, Bunescu R. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In <i>Proc</i>. <i>the</i> <i>2021 IEEE International Symposium on High-Performance Computer Architecture</i> (<i>HPCA</i>), Mar. 2021, pp.775–788. DOI: <a href="http://dx.doi.org/10.1109/HPCA51647.2021.00070">10.1109/HPCA51647.2021.00070</a>. </div> </td> </tr> <tr class="document-box" id="b13"> <td valign="top" class="td1"> [13] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Galal S, Horowitz M. Energy-efficient floating-point unit design. <i>IEEE Transactions on Computers</i>, 2011, 60(7): 913-922. </div> </td> </tr> <tr class="document-box" id="b14"> <td valign="top" class="td1"> [14] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv: 1609.02907, 2016. <a href="https://arxiv.org/abs/1609.02907">https://arxiv.org/abs/1609.02907</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b15"> <td valign="top" class="td1"> [15] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In <i>Proc</i>. <i>the 31st International Conference on Neural Information Processing Systems</i>, Dec. 2017, pp.1024–1034. </div> </td> </tr> <tr class="document-box" id="b16"> <td valign="top" class="td1"> [16] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Xu K, Hu W H, Leskovec J, Jegelka S. How powerful are graph neural networks? arXiv: 1810.00826, 2018. <a href="https://arxiv.org/abs/1810.00826">https://arxiv.org/abs/1810.00826</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b17"> <td valign="top" class="td1"> [17] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Allen J R, Kennedy K. Automatic loop interchange. In <i>Proc</i>. <i>the 1984 SIGPLAN Symposium on Compiler Construction</i>, Jun. 1984, pp.233–246. DOI: <a href="http://dx.doi.org/10.1145/502874.502897">10.1145/502874.502897</a>. </div> </td> </tr> <tr class="document-box" id="b18"> <td valign="top" class="td1"> [18] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zhang C, Li P, Sun G Y et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In <i>Proc</i>. <i>the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays</i>, Feb. 2015, pp.161–170. DOI: <a href="http://dx.doi.org/10.1145/2684746.2689060">10.1145/2684746.2689060</a>. </div> </td> </tr> <tr class="document-box" id="b19"> <td valign="top" class="td1"> [19] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Pugh W. Uniform techniques for loop optimization. In <i>Proc</i>. <i>the 5th International Conference on Supercomputing</i>, Jun. 1991, pp.341–352. DOI: <a href="http://dx.doi.org/10.1145/109025.109108">10.1145/109025.109108</a>. </div> </td> </tr> <tr class="document-box" id="b20"> <td valign="top" class="td1"> [20] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Pal S, Beaumont J, Park D H, Amarnath A, Feng S Y, Chakrabarti C, Kim H S, Blaauw D, Mudge T, Dreslinski R. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In <i>Proc</i>. <i>the 2018 IEEE International Symposium on High Performance Computer Architecture</i> (<i>HPCA</i>), Feb. 2018, pp.724–736. DOI: <a href="http://dx.doi.org/10.1109/HPCA.2018.00067">10.1109/HPCA.2018.00067</a>. </div> </td> </tr> <tr class="document-box" id="b21"> <td valign="top" class="td1"> [21] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Nie J. Memory-driven data-flow optimization for neural processing accelerators [Ph.D. Thesis]. Princeton University, 2020. <a href="https://www.proquest.com/openview/41fe23f43fd65cafaa8c2e051aed4059/1?pq-origsite=gscholar&cbl=18750&diss=y">https://www.proquest.com/openview/41fe23f43fd65cafaa8c2e051aed4059/1?pq-origsite=gscholar&cbl=18750&diss=y</a>, Jan. 2023. </div> </td> </tr> <tr class="document-box" id="b22"> <td valign="top" class="td1"> [22] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sen P, Namata G, Bilgic M et al. Collective classification in network data. <i>AI Magazine</i>, 2008, 29(3): 93-106. DOI: <a href="https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/2157">10.1609/aimag.v29i3.2157</a>. </div> </td> </tr> <tr class="document-box" id="b23"> <td valign="top" class="td1"> [23] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Carlson A, Betteridge J, Kisiel B et al. Toward an architecture for never-ending language learning. In <i>Proc. the 34th AAAI Conference on Artificial Intelligence</i>, July 2010, pp.1306-1313. </div> </td> </tr> <tr class="document-box" id="b24"> <td valign="top" class="td1"> [24] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Auten A, Tomei M, Kumar R. Hardware acceleration of graph neural networks. In <i>Proc</i>. <i>the</i> <i>57th ACM/IEEE Design Automation Conference</i> (<i>DAC</i>), Jul. 2020. DOI: <a href="http://dx.doi.org/10.1109/DAC18072.2020.9218751">10.1109/DAC18072.2020.9218751</a>. </div> </td> </tr> <tr class="document-box" id="b25"> <td valign="top" class="td1"> [25] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liang S W, Wang Y, Liu C et al. EnGN: A high-throughput and energy-efficient accelerator for large graph neural networks. <i>IEEE Trans</i>. <i>Computers</i>, 2021, 70(9): 1511–1525. DOI: <a href="http://dx.doi.org/10.1109/TC.2020.3014632">10.1109/TC.2020.3014632</a>. </div> </td> </tr> <tr class="document-box" id="b26"> <td valign="top" class="td1"> [26] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kiningham K, Re C, Levis P. GRIP: A graph neural network accelerator architecture. arXiv: 2007.13828, 2020. <a href="https://arxiv.org/abs/2007.13828v1">https://arxiv.org/abs/2007.13828v1</a>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b27"> <td valign="top" class="td1"> [27] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zeng H Q, Prasanna V. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In <i>Proc</i>. <i>the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays</i>, Feb. 2020, pp.255–265. DOI: <a href="http://dx.doi.org/10.1145/3373087.3375312">10.1145/3373087.3375312</a>. </div> </td> </tr> <tr class="document-box" id="b28"> <td valign="top" class="td1"> [28] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Shi F, Jin A Y, Zhu S C. VersaGNN: A versatile accelerator for graph neural networks. arXiv: 2105.01280, 2021. <a href="https://arxiv.org/abs/2105.01280">https://arxiv.org/abs/2105.01280</a>, Dec. 2022. </div> </td> </tr> </tbody> </table> |
[1] | . Tetris:面向统一内存多核神经网络加速器的启发式静态内存管理框架 [J]. 计算机科学技术学报, 2022, 37(6): 1255-1270. |
[2] | . 基于校准协同训练的图上对抗防御方法[J]. 计算机科学技术学报, 2022, 37(5): 1161-1175. |
[3] | Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen. 预训练和学习:在图神经网络中保留全局信息[J]. 计算机科学技术学报, 2021, 36(6): 1420-1430. |
[4] | Yu-Hang Liu, Xian-He Sun. 并发访问驱动下的新数据停顿时间[J]. , 2015, 30(2): 227-245. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |