›› 2015, Vol. 30 ›› Issue (1): 110-120.doi: 10.1007/s11390-015-1508-3

Special Issue: Computer Architecture and Systems

• Special Section on Computer Architecture and Systems for Big Data • Previous Articles     Next Articles

Improving the Performance and Energy Efficiency of Phase Change Memory Systems

Qi Wang1,2(王琪), Jia-Rui Li1,2(李佳芮), Dong-Hui Wang1(王东辉)   

  1. 1 Digital System Integration Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2014-07-13 Revised:2014-12-18 Online:2015-01-05 Published:2015-01-05
  • About author:Qi Wang received her B.S. degree in electronic information science and technology from Lanzhou University in 2011. She is now a Ph.D. candidate in Institute of Acoustics, Chinese Academy of Sciences, Beijing. Her research interests include VLSI design, computer architecture, and emerging memory technologies.
  • Supported by:

    This work was supported by the National Science and Technology Major Projects of China under Grant No. 2009ZX01 034-001-002-005 and the Knowledge Innovation Project of Institute of Acoustics, Chinese Academy of Sciences.

Phase change memory (PCM) is a promising technology for future memory thanks to its better scalability and lower leakage power than DRAM (dynamic random-access memory). However, adopting PCM as main memory needs to overcome its write issues, such as long write latency and high write power. In this paper, we propose two techniques to improve the performance and energy-efficiency of PCM memory systems. First, we propose a victim cache technique utilizing the existing buffer in the memory controller to reduce PCM memory accesses. The key idea is reorganizing the buffer into a victim cache structure (RBC) to provide additional hits for the LLC (last level cache). Second, we propose a chip parallelism-aware replacement policy (CPAR) for the victim cache to further improve performance. Instead of evicting one cache line once, CPAR evicts multiple cache lines that access different PCM chips. CPAR can reduce the frequent victim cache eviction and improve the write parallelism of PCM chips. The evaluation results show that, compared with the baseline, RBC can improve PCM memory system performance by up to 9.4% and 5.4% on average. Combing CPAR with RBC (RBC+CPAR) can improve performance by up to 19.0% and 12.1% on average. Moreover, RBC and RBC+CPAR can reduce memory energy consumption by 8.3% and 6.6% on average, respectively.

[1] Lefurgy C, Rajamani K, Rawson F, Felter W, Kistler M, Keller T W. Energy management for commercial servers. IEEE Computer, 2003, 36(12): 39-48.

[2] Lim K, Ranganathan P, Chang J, Patel C, Mudge T, Reinhardt S. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proc. the 35th Int. Symp. Computer Architecture, Jun. 2008, pp.315-326.

[3] Udipi A N, Muralimanohar N C, Chatterjee N et al. Rethinking DRAM design and organization for energyconstrained multi-cores. ACM SIGARCH Computer Architecture News, 2010, 38(3): 175-186.

[4] Hay A, Strauss K, Sherwood T, Loh G H, Burger D. Preventing PCM banks from seizing too much power. In Proc. the 44th IEEE/ACM Int. Symp. Microarchitecture, Dec. 2011, pp.186-195.

[5] Shi L, Xue C J, Hu J, Tseng W, Zhou X, Sha E H M. Write activity reduction on flash main memory via smart victim cache. In Proc. the 20th ACM Great Lakes Symposium on VLSI, May 2010, pp.91-94.

[6] Lee Y, Kim S, Hong S, Lee J. Skinflint DRAM system: Minimizing DRAM chip writes for low power. In Proc. the 19th IEEE Int. Symp. High Performance Computer Architecture, Feb. 2013, pp.25-34.

[7] Abts D, Bataineh A, Scott S, Faanes G, Schwarzmeier J, Lundberg E, Johnson T, Bye M, Schwoerer G. The Cray BlackWidow: A highly scalable vector multiprocessor. In Proc. ACM/IEEE Conf. Supercomputing, Nov. 2007, Article No. 17.

[8] Liptay J S. Structural aspects of the System/360 model 85: II the cache. IBM System Journal, 1968, 7(1): 15-21.

[9] Rothman J B, Smith A J. Sector cache design and performance. In Proc. the 8th Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Aug. 2000, pp.124-133.

[10] Zheng H, Lin J, Zhang Z, Gorbatov E, David H, Zhu Z. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st IEEE/ACM Int. Symp. Microarchitecture, Nov. 2008, pp.210-221.

[11] Brewer T M. Instruction set innovations for the Convey HC-1 computer. IEEE Micro, 2010, 30(2): 70-79.

[12] Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1-7.

[13] Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16-19.

[14] Lee B C, Ipek E, Mutlu O, Burger D. Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Computer Architecture News, 2009, 37(3): 2-13.

[15] Yang B, Lee J, Kim J, Cho J, Lee S, Yu B G. A low power phase-change random access memory using a datacomparison write scheme. In Proc. IEEE Int. Symp. Circuits and Systems, May. 2007, pp.3014-3017.

[16] Zhou P, Zhao B, Yang J, Zhang Y. A durable and energy efficient main memory using phase change memory technology. In Proc. the 36th Int. Symp. Computer Architecture, Jun. 2009, pp.14-23.

[17] Cho S, Lee H. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proc. the 42nd IEEE/ACM Int. Symp. Microarchitecture, Dec. 2009, pp.347-357.

[18] Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology. In Proc. the 36th Int. Symp. Computer Architecture, Jun. 2009, pp.24-33.

[19] Lee H G, Baek S, Nicopoulos C, Kim J. An energyand performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems. In Proc. the 29th IEEE Int. Conf. Computer Design, Oct. 2011, pp.381-387.

[20] Ramos L E, Gorbatov E, Bianchini R. Page placement in hybrid memory systems. In Proc. the Int. Conf. Supercomputing, Nov. 2011, pp.85-95.

[21] Jiang L, Zhang Y, Childers B R, Yang J. FPB: Fine-grained power budgeting to improve write throughput of multi-level cell phase change memory. In Proc. the 45th IEEE/ACM Int. Symp. Microarchitecture, Dec. 2012, pp.1-12.
No related articles found!
Full text



[1] Zhang Bo; Zhang Ling;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[2] Meng Liming; Xu Xiaofei; Chang Huiyou; Chen Guangxi; Hu Mingzeng; Li Sheng;. A Tree-Structured Database Machine for Large Relational Database Systems[J]. , 1987, 2(4): 265 -275 .
[3] Lin Qi; Xia Peisu;. The Design and Implementation of a Very Fast Experimental Pipelining Computer[J]. , 1988, 3(1): 1 -6 .
[4] Sun Chengzheng; Tzu Yungui;. A New Method for Describing the AND-OR-Parallel Execution of Logic Programs[J]. , 1988, 3(2): 102 -112 .
[5] Zhang Bo; Zhang Tian; Zhang Jianwei; Zhang Ling;. Motion Planning for Robots with Topological Dimension Reduction Method[J]. , 1990, 5(1): 1 -16 .
[6] Wang Dingxing; Zheng Weimin; Du Xiaoli; Guo Yike;. On the Execution Mechanisms of Parallel Graph Reduction[J]. , 1990, 5(4): 333 -346 .
[7] Zhou Quan; Wei Daozheng;. A Complete Critical Path Algorithm for Test Generation of Combinational Circuits[J]. , 1991, 6(1): 74 -82 .
[8] Zhao Jinghai; Liu Shenquan;. An Environment for Rapid Prototyping of Interactive Systems[J]. , 1991, 6(2): 135 -144 .
[9] Shang Lujun; Xu Lihui;. Notes on the Design of an Integrated Object-Oriented DBMS Family[J]. , 1991, 6(4): 389 -394 .
[10] Xu Jianguo; Gou Yuchai; Lin Zongkai;. HEPAPS:A PCB Automatic Placement System[J]. , 1992, 7(1): 39 -46 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved