提高PCM内存系统性能和能效的技术
Improving the Performance and Energy Efficiency of Phase Change Memory Systems
-
摘要: 相变存储器(PCM)具有比DRAM更好的扩展性、更低的漏电功耗,有希望用于未来的内存系统.然而基于PCM的内存系统需要克服PCM写问题,如写延迟长、写功耗大的问题.本文提出了两种提高PCM内存系统性能和能效的技术.首先,本文提出了一种牺牲Cache技术(RBC),利用内存控制器中已有的缓存减少对PCM内存的访问.RBC的核心思想是将内存控制器的缓存改成牺牲Cache.其次,本文提出了一种芯片级并行感知的牺牲Cache替换策略(CPAR).在发生牺牲Cache替换时,CPAR一次替换多个访问不同PCM芯片的Cache行,从而减少牺牲Cache的频繁替换,并提高芯片的写并行度.评测结果表明,RBC平均可以提高5.4%(最多为9.4%)的PCM内存系统性能,降低8.3%的内存系统能耗;将CPAR与RBC结合,平均可以提高12.1%(最多为19.0%)的性能,同时降低6.6%的内存能耗.Abstract: Phase change memory (PCM) is a promising technology for future memory thanks to its better scalability and lower leakage power than DRAM (dynamic random-access memory). However, adopting PCM as main memory needs to overcome its write issues, such as long write latency and high write power. In this paper, we propose two techniques to improve the performance and energy-efficiency of PCM memory systems. First, we propose a victim cache technique utilizing the existing buffer in the memory controller to reduce PCM memory accesses. The key idea is reorganizing the buffer into a victim cache structure (RBC) to provide additional hits for the LLC (last level cache). Second, we propose a chip parallelism-aware replacement policy (CPAR) for the victim cache to further improve performance. Instead of evicting one cache line once, CPAR evicts multiple cache lines that access different PCM chips. CPAR can reduce the frequent victim cache eviction and improve the write parallelism of PCM chips. The evaluation results show that, compared with the baseline, RBC can improve PCM memory system performance by up to 9.4% and 5.4% on average. Combing CPAR with RBC (RBC+CPAR) can improve performance by up to 19.0% and 12.1% on average. Moreover, RBC and RBC+CPAR can reduce memory energy consumption by 8.3% and 6.6% on average, respectively.