Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (1): 90-109.doi: 10.1007/s11390-020-0942-z

Special Issue: Computer Architecture and Systems

• Special Section on Memory-Centric System Research for High-Performance Computing • Previous Articles     Next Articles

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

Kai Wu and Dong Li*        

  1. Department of Electrical Engineering and Computer Science, University of California Merced, Merced 95343, U.S.A.
  • Received:2020-08-25 Revised:2020-11-30 Online:2021-01-05 Published:2021-01-23
  • Contact: Dong Li
  • About author:Kai Wu is a Ph.D. candidate at University of California Merced (UC Merced), Merced. Before coming to UC Merced, he earned his Master's degree in computer science and engineering from Michigan State University, East Lansing, in 2016. His research areas are computer system and high performance computing (HPC) with a focus on hardware heterogeneity. He designs high performance computer systems with memory heterogeneity. His recent work focuses on designing system support for persistent memory-based big memory platforms. He has published in the top-tier system/HPC conferences and journals, including FAST, HPCA, SC, PACT, ICPP, CLUSTER, etc.
  • Supported by:
    This work was partially supported by the U.S. National Science Foundation under Grant Nos. CNS-1617967, CCF-1553645, and CCF1718194.

Non-volatile memory (NVM) provides a scalable and power-efficient solution to replace dynamic random access memory (DRAM) as main memory. However, because of the relatively high latency and low bandwidth of NVM, NVM is often paired with DRAM to build a heterogeneous memory system (HMS). As a result, data objects of the application must be carefully placed to NVM and DRAM for the best performance. In this paper, we introduce a lightweight runtime solution that automatically and transparently manages data placement on HMS without the requirement of hardware modifications and disruptive change to applications. Leveraging online profiling and performance models, the runtime solution characterizes memory access patterns associated with data objects, and minimizes unnecessary data movement. Our runtime solution effectively bridges the performance gap between NVM and DRAM. We demonstrate that using NVM to replace the majority of DRAM can be a feasible solution for future HPC systems with the assistance of a software-based data management.

Key words: data management; non-volatile memory; runtime system;

[1] Dulloor S R, Roy A, Zhao Z G et al. Data tiering in heterogeneous memory systems. In Proc. the 11th European Conference on Computer Systems, April 2016, Article No. 15. DOI:10.1145/2901318.2901344.
[2] Giardino M, Doshi K, Ferri B. Soft2LM:Application guided heterogeneous memory management. In Proc. the 2016 International Conference on Networking, Architecture, and Storage, Aug. 2016. DOI:10.1109/NAS.2016.7549421.
[3] Lin F X, Liu X. memif:Towards programming heterogeneous memory asynchronously. In Proc. the 21st International Conference on Architectural Support for Programming Languages and Operating Systems, March 2016, pp.369-383. DOI:10.1145/2980024.2872401.
[4] Shen D, Liu X, Lin F X. Characterizing emerging heterogeneous memory. In Proc. the 2016 ACM SIGPLAN International Symposium on Memory Management, June 2016, pp.13-23. DOI:10.1145/2926697.2926702.
[5] Wang B, Wu B, Li D, Shen X, Yu W, Jiao Y, Vetter J S. Exploring hybrid memory for GPU energy efficiency through software-hardware co-design. In Proc. the 22nd International Conference on Parallel Architectures and Compilation Techniques, Sept. 2013, pp.93-102. DOI:10.1109/PACT.2013.6618807.
[6] Wu K, Ren J, Li D. Runtime data management on nonvolatile memory-based heterogeneous memory for taskparallel programs. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 31. DOI:10.1109/SC.2018.00034.
[7] Wu P, Li D, Chen Z, Vetter J, Mittal S. Algorithm-directed data placement in explicitly managed no-volatile memory. In Proc. the 25th ACM Symposium on High-Performance Parallel and Distributed Computing, May 2016, pp.141-152. DOI:10.1145/2907294.2907321.
[8] Qureshi M K, Franchescini M, Srinivasan V, Lastras L, Abali B, Karidis J. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proc. the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009, pp.14-23. DOI:10.1145/1669112.1669117.
[9] Qureshi M K, Srinivasan V, Rivers J A. Scalable highperformance main memory system using phase-change memory technology. In Proc. the 36th International Symposium on Computer Architecture, June 2009, pp.24-33. DOI:10.1145/1555754.1555760.
[10] Yoon H, Meza J, Ausavarungnirun R, Harding R, Mutlu O. Row buffer locality aware caching policies for hybrid memories. In Proc. the 30th IEEE International Conference on Computer Design, Sept. 30-Oct. 3, 2012, pp.337-344. DOI:10.1109/ICCD.2012.6378661.
[11] Wu K, Huang Y, Li D. Unimem:Runtime data management on non-volatile memory-based heterogeneous main memory. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 58. DOI:10.1145/3126908.3126923.
[12] Bailey D H, Barszcz E, Dagum L, Simon H D. Nas parallel benchmark results. In Proc. the 1992 ACM/IEEE Conference on Supercomputing, Nov. 1992, pp.386-393. DOI:10.1109/SUPERC.1992.236665.
[13] Izraelevitz J, Yang J, Zhang L et al. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv:1903.05714, 2019., October 2020.
[14] Suzuki K, Swanson S. The non-volatile memory technology database (NVMDB). Technical Report, Department of Computer Science & Engineering, University of California, 2015. NVMDB.pdf, Oct. 2020.
[15] Volos H, Magalhaes G, Cherkasova L, Li J. Quartz:A lightweight performance emulator for persistent memory software. In Proc. the 16th Annual Middleware Conference, November 2015, pp.37-49. DOI:10.1145/2814576.2814806.
[16] Li D, Vetter J, Marin G, McCurdy C, Cira C, Liu Z, Yu W. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proc. the 26th International Parallel and Distributed Processing Symposium, May 2012, pp.945-956. DOI:10.1109/IPDPS.2012.89.
[17] Silvano M, Toth P. Knapsack Problems:Algorithms and Computer Implementations (1st edition). John Wiley & Sons, 1990.
[18] Agarwal N, Nellans D, Stephenson M, O'Connor M, Keckler S W. Page placement strategies for GPUs within heterogeneous memory systems. In Proc. the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2015, pp.607-618. DOI:10.1145/2775054.2694381.
[19] Ding C, Kennedy K. Bandwidth-based performance tuning and prediction. In Proc. the 1990 IASTED International Conference on Parallel Computing and Distributed Systems, November 1999.
[20] Berger E D, McKinley K S, Blumofe R D, Wilson P R. Hoard:A scalable memory allocator for multithreaded applications. In Proc. the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000, pp.117-128. DOI:10.1145/378993.379232.
[21] Michael M M. Scalable lock-free dynamic memory allocation. In Proc. the 2004 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004, pp.35-46. DOI:10.1145/996893.996848.
[22] Lattner C. LLVM:An infrastructure for multi-stage optimization[Ph.D. Thesis]. Computer Science Dept., Univ. of Illinois at Urbana-Champaign, 2002.
[23] Chakaravarthy V T. New results on the computability and complexity of points-to analysis. In Proc. the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2003, pp.115-125. DOI:10.1145/640128.604142.
[24] Volos H, Tack A J, Swift M M. Mnemosyne:Lightweight persistent memory. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2011, pp.91-104. DOI:10.1145/2248487.1950379.
[25] Wen S, Cherkasova L, Lin F X, Liu X. ProfDP:A lightweight profiler to guide data placement in heterogeneous memory systems. In Proc. the 2018 International Conference on Supercomputing, June 2018, pp.263- 273. DOI:10.1145/3205289.3205320.
[26] Lachaize R, Lepers B, Quéma V. MemProf:A memory profiler for NUMA multicore systems. In Proc. the 2012 USENIX Annual Technical Conference, June 2012, pp.53- 64.
[27] Liu X, Mellor-Crummey J. A data-centric profiler for parallel programs. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2013, Article No. 28. DOI:10.1145/2503210.2503297.
[28] Liu X, Wu B. ScaAnalyzer:A tool to identify memory scalability bottlenecks in parallel programs. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2015, Article No. 47. DOI:10.1145/2807591.2807648.
[29] Liu X, Mellor-Crummey J. Pinpointing data locality problems using data-centric analysis. In Proc. the 9th International Symposium on Code Generation and Optimization, April 2011, pp.171-180. DOI:10.1109/CGO.2011.5764685.
[30] McCurdy C, Vetter J. Memphis:Finding and fixing NUMA-related performance problems on multi-core platforms. In Proc. the 2010 IEEE International Symposium on Performance Analysis of Systems Software, March 2010, pp.87-96. DOI:10.1109/ISPASS.2010.5452060.
[31] Chen Y, Peng I B, Peng Z, Liu X, Ren B. ATMem:Adaptive data placement in graph applications on heterogeneous memories. In Proc. the 18th ACM/IEEE International Symposium on Code Generation and Optimization, February 2020, pp.293-304. DOI:10.1145/3368826.3377922.
[32] Bivens A, Dube P, Franceschini M, Karidis J, Lastras L, Tsao M. Architectural design for next generation heterogeneous memory systems. In Proc. the 2010 International Memory Workshop, May 2010. DOI:10.1109/IMW.2010.5488395.
[1] Ning Bao, Yun-Peng Chai, Xiao Qin, and Chuan-Wen Wang. MacroTrend: A Write-Efficient Cache Algorithm for NVM-Based Read Cache [J]. Journal of Computer Science and Technology, 2022, 37(1): 207-230.
[2] Hai-Kun Liu, Di Chen, Hai Jin, Xiao-Fei Liao, Binsheng He, Kan Hu, Yu Zhang. A Survey of Non-Volatile Main Memory Technologies: State-of-the-Arts, Practices, and Future Directions [J]. Journal of Computer Science and Technology, 2021, 36(1): 4-32.
[3] Zhi-Guang Chen, Yu-Bo Liu, Yong-Feng Wang, Yu-Tong Lu. A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems [J]. Journal of Computer Science and Technology, 2021, 36(1): 44-55.
[4] Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Design and Implementation of the Tianhe-2 Data Storage and Management System [J]. Journal of Computer Science and Technology, 2020, 35(1): 27-46.
[5] Ziqi Fan, Dongchul Park. Extending SSD Lifespan with Comprehensive Non-Volatile Memory-Based Write Buffers [J]. Journal of Computer Science and Technology, 2019, 34(1): 113-132.
[6] Jun-Qiang Liu (刘君强). Publishing Set-Valued Data Against Realistic Adversaries [J]. , 2012, 27(1): 24-36.
[7] Dong-Rui Fan, Member, CCF, IEEE, Nan Yuan, Jun-Chao Zhang, Member, CCF, ACM, Yong-Bin Zhou, Wei Lin, Feng-Long Song, Xiao-Chun Ye, He Huang, Lei Yu, Guo-Ping Long, Hao Zhang, and Lei Liu. Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions [J]. , 2009, 24(6): 1061-1073.
[8] Gang Wu, Juan-Zi Li, Member, CCF, ACM, Jian-Qiang Hu, and Ke-Hong Wang, Member, CCF. System |Π: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model [J]. , 2009, 24(4): 652-664.
[9] Jing Zhou, Member, ACM, Wendy Hall, Member, ACM, and David De Roure, Member, ACM. Building a Distributed Infrastructure for Scalable Triple Stores [J]. , 2009, 24(3): 447-462.
[10] Qiong Zou, Xiao-Feng Li, and Long-Bing Zhang. Runtime Engine for Dynamic Profile Guided Stride Prefetching [J]. , 2008, 23(4 ): 633-643 .
[11] Shan Wang, Xiao-Yong Du, Xiao-Feng Meng, and Hong Chen. Database Research: Achievements and Challenges [J]. , 2006, 21(5): 823-837 .
[12] ZHOU AoYing (周傲英), QIAN WeiNing (钱卫宁),ZHOU ShuiGeng (周水庚), LING Bo (凌 波), XU LinHao (徐林昊)Ng Wee Siong (黄维雄), Ooi Beng Chin (黄铭钧). Data Management in Peer-to-Peer Environment: A Perspective of BestPeer [J]. , 2003, 18(4): 0-0.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Shi Zhongzhi;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[10] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved