一种面向多加速器架构的新型内存系统设计

王颖; 张磊; 韩银和; 李华伟

doi:10.1007/s11390-014-1429-6

一种面向多加速器架构的新型内存系统设计

Reinventing Memory System Design for Many-Accelerator Architecture

摘要

摘要: 多加速器架构通常集成通用处理器核以及多个类加速器功能单元（Function Unit，FU），从而具备出色的性能功耗比，并被认为是一种替代同构片上多核处理器（Chip Multi-Processors，CMP）的有效设计。不同于一般的多核处理器，这种新型多加速器处理器比通用处理器（General Purpose Processor，GPP）表现出更复杂的访存行为，这是由于多样化的FU会在工作中产生出高并发度、并且局部性以及带宽需求差异较大的访存流。因此，来自多加速器的无序访存流会彼此干扰。传统的主存交界面由于缺乏灵活性，并不能很好地处理这些问题。因此，我们提出了聚合内存系统（Aggregation Memory System，AMS），该设计采用Sub-Rank Binding的方法，智能地映射各个FU的数据块以保护其局部性，同时达到提供不同访存粒度给各个FU的效果，所以AMS具备了针对拥有不同特性访存流的自适应工作模式。另外，AMS通过我们设计的调度算法，可以将非冲突性访存请求打包起来使其能够并行地得到响应。实验结果显示，AMS能够显著地提高访存性能并降低能耗。

Abstract: The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-effciency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual-interference behavior and cannot be effciently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.

HTML全文

参考文献()

施引文献

资源附件()