›› 2015,Vol. 30 ›› Issue (1): 110-120.doi: 10.1007/s11390-015-1508-3

所属专题: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

提高PCM内存系统性能和能效的技术

Qi Wang1,2(王琪), Jia-Rui Li1,2(李佳芮), Dong-Hui Wang1(王东辉)   

  1. 1 Digital System Integration Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • 收稿日期:2014-07-13 修回日期:2014-12-18 出版日期:2015-01-05 发布日期:2015-01-05
  • 作者简介:Qi Wang received her B.S. degree in electronic information science and technology from Lanzhou University in 2011. She is now a Ph.D. candidate in Institute of Acoustics, Chinese Academy of Sciences, Beijing. Her research interests include VLSI design, computer architecture, and emerging memory technologies.
  • 基金资助:

    This work was supported by the National Science and Technology Major Projects of China under Grant No. 2009ZX01 034-001-002-005 and the Knowledge Innovation Project of Institute of Acoustics, Chinese Academy of Sciences.

Improving the Performance and Energy Efficiency of Phase Change Memory Systems

Qi Wang1,2(王琪), Jia-Rui Li1,2(李佳芮), Dong-Hui Wang1(王东辉)   

  1. 1 Digital System Integration Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2014-07-13 Revised:2014-12-18 Online:2015-01-05 Published:2015-01-05
  • About author:Qi Wang received her B.S. degree in electronic information science and technology from Lanzhou University in 2011. She is now a Ph.D. candidate in Institute of Acoustics, Chinese Academy of Sciences, Beijing. Her research interests include VLSI design, computer architecture, and emerging memory technologies.
  • Supported by:

    This work was supported by the National Science and Technology Major Projects of China under Grant No. 2009ZX01 034-001-002-005 and the Knowledge Innovation Project of Institute of Acoustics, Chinese Academy of Sciences.

相变存储器(PCM)具有比DRAM更好的扩展性、更低的漏电功耗,有希望用于未来的内存系统.然而基于PCM的内存系统需要克服PCM写问题,如写延迟长、写功耗大的问题.本文提出了两种提高PCM内存系统性能和能效的技术.首先,本文提出了一种牺牲Cache技术(RBC),利用内存控制器中已有的缓存减少对PCM内存的访问.RBC的核心思想是将内存控制器的缓存改成牺牲Cache.其次,本文提出了一种芯片级并行感知的牺牲Cache替换策略(CPAR).在发生牺牲Cache替换时,CPAR一次替换多个访问不同PCM芯片的Cache行,从而减少牺牲Cache的频繁替换,并提高芯片的写并行度.评测结果表明,RBC平均可以提高5.4%(最多为9.4%)的PCM内存系统性能,降低8.3%的内存系统能耗;将CPAR与RBC结合,平均可以提高12.1%(最多为19.0%)的性能,同时降低6.6%的内存能耗.

Abstract: Phase change memory (PCM) is a promising technology for future memory thanks to its better scalability and lower leakage power than DRAM (dynamic random-access memory). However, adopting PCM as main memory needs to overcome its write issues, such as long write latency and high write power. In this paper, we propose two techniques to improve the performance and energy-efficiency of PCM memory systems. First, we propose a victim cache technique utilizing the existing buffer in the memory controller to reduce PCM memory accesses. The key idea is reorganizing the buffer into a victim cache structure (RBC) to provide additional hits for the LLC (last level cache). Second, we propose a chip parallelism-aware replacement policy (CPAR) for the victim cache to further improve performance. Instead of evicting one cache line once, CPAR evicts multiple cache lines that access different PCM chips. CPAR can reduce the frequent victim cache eviction and improve the write parallelism of PCM chips. The evaluation results show that, compared with the baseline, RBC can improve PCM memory system performance by up to 9.4% and 5.4% on average. Combing CPAR with RBC (RBC+CPAR) can improve performance by up to 19.0% and 12.1% on average. Moreover, RBC and RBC+CPAR can reduce memory energy consumption by 8.3% and 6.6% on average, respectively.

[1] Lefurgy C, Rajamani K, Rawson F, Felter W, Kistler M, Keller T W. Energy management for commercial servers. IEEE Computer, 2003, 36(12): 39-48.

[2] Lim K, Ranganathan P, Chang J, Patel C, Mudge T, Reinhardt S. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proc. the 35th Int. Symp. Computer Architecture, Jun. 2008, pp.315-326.

[3] Udipi A N, Muralimanohar N C, Chatterjee N et al. Rethinking DRAM design and organization for energyconstrained multi-cores. ACM SIGARCH Computer Architecture News, 2010, 38(3): 175-186.

[4] Hay A, Strauss K, Sherwood T, Loh G H, Burger D. Preventing PCM banks from seizing too much power. In Proc. the 44th IEEE/ACM Int. Symp. Microarchitecture, Dec. 2011, pp.186-195.

[5] Shi L, Xue C J, Hu J, Tseng W, Zhou X, Sha E H M. Write activity reduction on flash main memory via smart victim cache. In Proc. the 20th ACM Great Lakes Symposium on VLSI, May 2010, pp.91-94.

[6] Lee Y, Kim S, Hong S, Lee J. Skinflint DRAM system: Minimizing DRAM chip writes for low power. In Proc. the 19th IEEE Int. Symp. High Performance Computer Architecture, Feb. 2013, pp.25-34.

[7] Abts D, Bataineh A, Scott S, Faanes G, Schwarzmeier J, Lundberg E, Johnson T, Bye M, Schwoerer G. The Cray BlackWidow: A highly scalable vector multiprocessor. In Proc. ACM/IEEE Conf. Supercomputing, Nov. 2007, Article No. 17.

[8] Liptay J S. Structural aspects of the System/360 model 85: II the cache. IBM System Journal, 1968, 7(1): 15-21.

[9] Rothman J B, Smith A J. Sector cache design and performance. In Proc. the 8th Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Aug. 2000, pp.124-133.

[10] Zheng H, Lin J, Zhang Z, Gorbatov E, David H, Zhu Z. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st IEEE/ACM Int. Symp. Microarchitecture, Nov. 2008, pp.210-221.

[11] Brewer T M. Instruction set innovations for the Convey HC-1 computer. IEEE Micro, 2010, 30(2): 70-79.

[12] Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1-7.

[13] Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16-19.

[14] Lee B C, Ipek E, Mutlu O, Burger D. Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Computer Architecture News, 2009, 37(3): 2-13.

[15] Yang B, Lee J, Kim J, Cho J, Lee S, Yu B G. A low power phase-change random access memory using a datacomparison write scheme. In Proc. IEEE Int. Symp. Circuits and Systems, May. 2007, pp.3014-3017.

[16] Zhou P, Zhao B, Yang J, Zhang Y. A durable and energy efficient main memory using phase change memory technology. In Proc. the 36th Int. Symp. Computer Architecture, Jun. 2009, pp.14-23.

[17] Cho S, Lee H. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proc. the 42nd IEEE/ACM Int. Symp. Microarchitecture, Dec. 2009, pp.347-357.

[18] Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology. In Proc. the 36th Int. Symp. Computer Architecture, Jun. 2009, pp.24-33.

[19] Lee H G, Baek S, Nicopoulos C, Kim J. An energyand performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems. In Proc. the 29th IEEE Int. Conf. Computer Design, Oct. 2011, pp.381-387.

[20] Ramos L E, Gorbatov E, Bianchini R. Page placement in hybrid memory systems. In Proc. the Int. Conf. Supercomputing, Nov. 2011, pp.85-95.

[21] Jiang L, Zhang Y, Childers B R, Yang J. FPB: Fine-grained power budgeting to improve write throughput of multi-level cell phase change memory. In Proc. the 45th IEEE/ACM Int. Symp. Microarchitecture, Dec. 2012, pp.1-12.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张钹; 张铃;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[2] 孟力明; 徐晓飞; 常会友; 陈光熙; 胡铭曾; 李生;. A Tree-Structured Database Machine for Large Relational Database Systems[J]. , 1987, 2(4): 265 -275 .
[3] 林琦; 夏培肃;. The Design and Implementation of a Very Fast Experimental Pipelining Computer[J]. , 1988, 3(1): 1 -6 .
[4] 孙成政; 慈云桂;. A New Method for Describing the AND-OR-Parallel Execution of Logic Programs[J]. , 1988, 3(2): 102 -112 .
[5] 张钹; 张恬; 张建伟; 张铃;. Motion Planning for Robots with Topological Dimension Reduction Method[J]. , 1990, 5(1): 1 -16 .
[6] 王鼎兴; 郑纬民; 杜晓黎; 郭毅可;. On the Execution Mechanisms of Parallel Graph Reduction[J]. , 1990, 5(4): 333 -346 .
[7] 周权; 魏道政;. A Complete Critical Path Algorithm for Test Generation of Combinational Circuits[J]. , 1991, 6(1): 74 -82 .
[8] 赵靓海; 刘慎权;. An Environment for Rapid Prototyping of Interactive Systems[J]. , 1991, 6(2): 135 -144 .
[9] 商陆军; 许立辉;. Notes on the Design of an Integrated Object-Oriented DBMS Family[J]. , 1991, 6(4): 389 -394 .
[10] 许建国; 郭玉钗; 林宗楷;. HEPAPS:A PCB Automatic Placement System[J]. , 1992, 7(1): 39 -46 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: