计算存储一体化基因比对FM-index算法加速体系结构设计
PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm
-
摘要: 研究背景:在生物信息学中,FM-Index算法是基因组数据分析中的一个重要算法。测序技术产生的海量基因大数据给基于FM-index的基因比对程序带来了巨大的挑战。现阶段的研究工作通过利用SIMD,FPGA和ASIC等技术加速了FM-index算法,特别是在定制加速器领域取得了很好的加速效果。但是,大量的随机内存访问造成了传统冯·诺依曼体系结构中处理单元与内存之间的巨大数据搬移,现有的内存带宽也限制了对于算法并行性的挖掘
。目的:本文中,我们认为计算存储一体化(或称近内存计算)是解决这些挑战的可行解决方案。
方法:本文量化分析了FM-index算法的计算和访存特征,基于FM-index算法特征和3D堆叠存储器的特性,设计并实现了一个基因数据比对算法的加速体系结构。为了充分利用3D堆叠技术提供的更高且可扩展的内存带宽,本文提出了(1)一种充分利用可用内存带宽的新加速器结构;(2)轻量级消息传递机制和非阻塞通信机制;(3)计算-访存解耦与数据预取机制。
结果:实验表明,与最佳可用的ASIC解决方案相比,近内存计算加速器远未触及3D堆叠存储器逻辑层的能耗、面积等开销限制的红线,并且在原有加速器基础上将性能提升了20多倍。Abstract: Genomic sequence alignment is the most critical and time-consuming step in genomic analysis. Alignment algorithms generally follow a seed-and-extend model. Acceleration of the extension phase for sequence alignment has been well explored in computing-centric architectures on field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), and graphics processing unit (GPU) (e.g., the Smith-Waterman algorithm). Compared with the extension phase, the seeding phase is more critical and essential. However, the seeding phase is bounded by memory, i.e., fine-grained random memory access and limited parallelism on conventional system. In this paper, we argue that the processing-in-memory (PIM) concept could be a viable solution to address these problems. This paper describes \PIM-Align"|an application-driven near-data processing architecture for sequence alignment. In order to achieve memory-capacity proportional performance by taking advantage of 3D-stacked dynamic random access memory (DRAM) technology, we propose a lightweight message mechanism between different memory partitions, and a specialized hardware prefetcher for memory access patterns of sequence alignment. Our evaluation shows that the proposed architecture can achieve 20x and 1 820x speedup when compared with the best available ASIC implementation and the software running on 32-thread CPU, respectively.