图神经网络加速器访存优化方法
GShuttle: Optimizing Memory Access Efficiency for Graph Convolutional Neural Network Accelerators
-
摘要:研究背景 图卷积神经网络是将深度学习应用到图领域的一种重要方法。但图中往往存在大量节点并且节点之间在连接上存在显著差异性,导致底层计算平台往往无法高效的处理图卷积神经网络。最近,研究人员提出了专用的图卷积神经网络加速器来提升处理图卷积神经网络的性能和能效。然而,他们都没有系统地研究循环优化技术在访存方面对加速器能效的影响。这些工作只采用了有限的循环优化技术,几乎无法有效地利用数据重用,从而导致访存增加。由于访存能耗远远大于计算能耗,因此会大幅降低加速器能效。本工作的研究目标是通过建立图卷积神经网络加速框架,系统地探索图卷积神经网络的访存优化技术,最大限度地提高访存效率,从而提高图卷积神经网络加速器的能效。目的 本篇论文的研究目标是提供一种新的金融时间序列计算框架,兼容Python Pandas的接口,优化单线程的性能表现,并支持多线程和CUDA进行计算加速,同时实现了更多金融时间序列的函数。研究方法 本工作提出名为GShuttle的图卷积神经网络访存优化框架,GShuttle通过两种算法以在给定约束条件下为加速器确定最佳的设计参数,这两种算法分别为贪心算法(GShuttle-GS)和搜索空间剪枝法(GShuttle-PSSS)。本工作设计了基于C++的加速器模拟器来评估所提方法在减少访存方面的效果,并在五个图数据集上进行了验证。实验和结果 实验结果表明,GShuttle相比于已有工作,片外访存效率提升高达 70%。结论和展望 图卷积神经网络加速器的大部分能耗用于访存中,因此提升访存效率是提高加速器能效的关键。本文提出名为GShuttle的图卷积神经网络加速框架,对图神经网络加速器访存优化问题进行了形式化的定义,并提出两种算法来对这一优化问题进行求解。所提出的两种方法都能快速高效地搜索到图神经网络加速器较佳的设计参数。实验结果表明,GShuttle能够显著提升加速器的访存效率。并且,因为所提访存优化方法与已有的计算优化方法不存在冲突,GShuttle有望用于其他的图神经网络加速器,如HyGCN、AWB-GCN中以提升这些加速器的访存效率。Abstract: Graph convolutional neural networks (GCNs) have emerged as an effective approach to extending deep learning for graph data analytics, but they are computationally challenging given the irregular graphs and the large number of nodes in a graph. GCNs involve chain sparse-dense matrix multiplications with six loops, which results in a large design space for GCN accelerators. Prior work on GCN acceleration either employs limited loop optimization techniques, or determines the design variables based on random sampling, which can hardly exploit data reuse efficiently, thus degrading system efficiency. To overcome this limitation, this paper proposes GShuttle, a GCN acceleration scheme that maximizes memory access efficiency to achieve high performance and energy efficiency. GShuttle systematically explores loop optimization techniques for GCN acceleration, and quantitatively analyzes the design objectives (e.g., required DRAM accesses and SRAM accesses) by analytical calculation based on multiple design variables. GShuttle further employs two approaches, pruned search space sweeping and greedy search, to find the optimal design variables under certain design constraints. We demonstrated the efficacy of GShuttle by evaluation on five widely used graph datasets. The experimental simulations show that GShuttle reduces the number of DRAM accesses by a factor of 1.5 and saves energy by a factor of 1.7 compared with the state-of-the-art approaches.