计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (1): 71-89.doi: 10.1007/s11390-021-0771-8

所属专题: Computer Architecture and Systems

• • 上一篇    下一篇

关于储存系统建模和优化的综述

Jason Liu1, Pedro Espina1, and Xian-He Sun2, Fellow, IEEE   

  1. 1 School of Computing and Information Sciences, Florida International University, Miami, FL 33199, U.S.A.;
    2 Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, U.S.A
  • 收稿日期:2020-07-02 修回日期:2020-11-19 出版日期:2021-01-05 发布日期:2021-01-23
  • 作者简介:Jason Liu is a University Eminent Scholar Chaired Professor at the School of Computing and Information Sciences, Florida International University (FIU) in Miami, Florida, USA. His research focuses on modeling and simulation, parallel discrete-event simulation, performance modeling and simulation of computer systems and computer networks. He currently serves on the Editorial Board of ACM Transactions on Modeling and Computer Simulation (TOMACS), SIMULATION, Transactions of the Society for Modeling and Simulation International, and IEEE Networking Letters. He is also on the Steering Committee of ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS). Jason Liu is an NSF CAREER awardee in 2006 and an ACM Distinguished Scientist in 2014.
  • 基金资助:
    This work is supported in part by the U.S. National Science Foundation under Grant Nos. CCF-2008000, CNS-1730488, and CCF-2008907, and the U.S. Department of Homeland Security under Grant No. 2017-ST-062-000002.

A Study on Modeling and Optimization of Memory Systems

Jason Liu1, Pedro Espina1, and Xian-He Sun2, Fellow, IEEE        

  1. 1 School of Computing and Information Sciences, Florida International University, Miami, FL 33199, U.S.A.;
    2 Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, U.S.A
  • Received:2020-07-02 Revised:2020-11-19 Online:2021-01-05 Published:2021-01-23
  • About author:Jason Liu is a University Eminent Scholar Chaired Professor at the School of Computing and Information Sciences, Florida International University (FIU) in Miami, Florida, USA. His research focuses on modeling and simulation, parallel discrete-event simulation, performance modeling and simulation of computer systems and computer networks. He currently serves on the Editorial Board of ACM Transactions on Modeling and Computer Simulation (TOMACS), SIMULATION, Transactions of the Society for Modeling and Simulation International, and IEEE Networking Letters. He is also on the Steering Committee of ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS). Jason Liu is an NSF CAREER awardee in 2006 and an ACM Distinguished Scientist in 2014.
  • Supported by:
    This work is supported in part by the U.S. National Science Foundation under Grant Nos. CCF-2008000, CNS-1730488, and CCF-2008907, and the U.S. Department of Homeland Security under Grant No. 2017-ST-062-000002.

每周期访问数(APC),并发平均内存访问时间(C-AMAT)和分层性能匹配(LPM)为三个同时考虑数据局部性和内存评估并发性的储存性能模型。APC模型测算内存体系结构的吞吐量,并由此反映储存系统的服务质量。C-AMAT模型递归表达储存访问延迟,并由此可以识别分级储存体系中的潜在瓶颈。LPM模型将每个储存层级的全局储存系统优化转化为局部优化,这通过应用程序的数据访问需求与潜在的储存系统设计匹配实现。在以往的研究中均已分别提出了上述的三个模型。本文在连贯的数学框架下重新检验了这三个模型。更准确地说,本文呈现了一个全新的数据访问储存中心视角。每个储存层级的储存周期可分为四个不同的类别,使用这四个不同的类别沿着储存分层结构界定内存访问延迟与并发。新视角提供了新见解,即,清楚地表达了同时考虑局部性和并发性的储存性能。因此,性能模型简单明了,可用于工程实践。同样,储存中心方法有助于创建统一的数学底层思维以分析和优化现在和未来的储存系统模型驱动性能。

关键词: 性能建模, 性能优化, 储存结构体系, 储存分级结构体系, 并发平均内存访问时间

Abstract: Accesses Per Cycle (APC), Concurrent Average Memory Access Time (C-AMAT), and Layered Performance Matching (LPM) are three memory performance models that consider both data locality and memory assess concurrency. The APC model measures the throughput of a memory architecture and therefore reflects the quality of service (QoS) of a memory system. The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy. The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design. These three models have been proposed separately through prior efforts. This paper reexamines the three models under one coherent mathematical framework. More specifically, we present a new memory-centric view of data accesses. We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy. This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency. Consequently, the performance model can be easily understood and applied in engineering practices. As such, the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems.

Key words: performance modeling, performance optimization, memory architecture, memory hierarchy, concurrent average memory access time

[1] Wulf W A, McKee S A. Hitting the memory wall:Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1):20-24. DOI:10.1145/216585.216588.
[2] Denning P J. The working set model for program behavior. In Proc. the 1st ACM Symposium on Operating System Principles, October 1967, Article No. 15. DOI:10.1145/357980.357997.
[3] Denning P J. The locality principle. In Communication Networks and Computer Systems:A Tribute to Professor Erol Gelenbe, Barria G A (ed.), London, Imperial College Press, 2006, pp.43-67.
[4] Chou Y, Fahs B, Abraham S G. Microarchitecture optimizations for exploiting memory-level parallelism. In Proc. the 31st Annual International Symposium on Computer Architecture, June 2004, pp.76-87. DOI:10.1109/ISCA.2004.1310765.
[5] Sun X H, Wang D W. Concurrent average memory access time. Computer, 2014, 47(5):74-80. DOI:10.1109/MC.2013.227.
[6] Wang D W, Sun X H. APC:A novel memory metric and measurement methodology for modern memory systems. IEEE Transactions on Computers, 2014, 63(7):1626-1639. DOI:10.1109/TC.2013.38.
[7] Liu Y, Sun X. LPM:A systematic methodology for concurrent data access pattern optimization from a matching perspective. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(11):2478-2493. DOI:10.1109/TPDS.2019.2912573.
[8] Hennessy J L, Patterson D A. Computer Architecture:A Quantitative Approach (5th edition). Morgan Kaufmann, 2011.
[9] Tuck J, Ceze L, Torrellas J. Scalable cache miss handling for high memory-level parallelism. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp.409-422. DOI:10.1109/MICRO.2006.44.
[10] Lim K, Turner Y, Santos J R, AuYoung A, Chang J, Ranganathan P, Wenisch T F. System-level implications of disaggregated memory. In Proc. the 2012 IEEE International Symposium on High-Performance Comp Architecture, Feb. 2012, pp.189-200. DOI:10.1109/HPCA.2012.6168955.
[11] Gao P X, Narayan A, Karandikar S, Carreira J, Han S, Agarwal R, Ratnasamy S, Shenker S. Network requirements for resource disaggregation. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.249-264. DOI:10.5555/3026877.3026897.
[12] Zhang N, Toonen B, Sun X H, Allcock B. Performance modeling and evaluation of a production disaggregated memory system. In Proc. the 2020 International Symposium on Memory Systems, Sept. 28-Oct. 2. 2020.
[13] Zhang N, Jiang C T, Sun X H, Song S. Evaluating GPGPU memory performance through the C-AMAT model. In Proc. the Workshop on Memory Centric Programming for HPC, Nov. 2017, pp.35-39. DOI:10.1145/3145617.3158214.
[14] Sun X H, Ni L M. Another view on parallel speedup. In Proc. the 1990 ACM/IEEE Conference on Supercomputing, November 1990, pp.324-333. DOI:10.1109/SUPERC.1990.130037.
[15] Mattson R L, Gecsei J, Slutz D R, Traiger I L. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970, 9(2):78-117. DOI:10.1147/sj.92.0078.
[16] Weinberg J, McCracken M O, Strohmaier E, Snavely A. Quantifying locality in the memory access patterns of HPC applications. In Proc. the 2005 ACM/IEEE Conference on Supercomputing, November 2005, Article No. 50. DOI:10.1109/SC.2005.59.
[17] Berg E, Hagersten E. Fast data-locality profiling of native execution. In Proc. the International Conference on Measurements and Modeling of Computer Systems, June 2005, pp.169-180. DOI:10.1145/1071690.1064232.
[18] Gu X M, Christopher I, Bai T X, Zhang C L, Ding C. A component model of spatial locality. In Proc. the 8th International Symposium on Memory Management, June 2009, pp.99-108. DOI:10.1145/1542431.1542446.
[19] Anghel A, Dittmann G, Jongerius R, Luijten R. Spatiotemporal locality characterization. In Proc. the 1st Workshop on Near Data Processing, December 2013.
[20] Ding C, Xiang X Y. A higher order theory of locality. In Proc. the 2012 ACM SIGPLAN Workshop on Memory System Performance Correctness, June 2012, pp.68-69. DOI:10.1145/2247684.2247697.
[21] Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2003, pp.245-257. DOI:10.1145/781131.781159.
[22] Jiang Y L, Zhang E Z, Tian K, Shen X P. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proc. the 19th International Conference on Compiler Construction, March 2010, pp.264-282. DOI:10.1007/978- 3-642-11970-515.
[23] Gupta S, Xiang P, Yang Y, Zhou H Y. Locality principle revisited:A probability-based quantitative approach. Journal of Parallel and Distributed Computing, 2013, 73(7):1011- 1027. DOI:10.1016/j.jpdc.2013.01.010.
[24] Liu Y H, Sun X H. CaL:Extending data locality to consider concurrency for performance optimization. IEEE Transactions on Big Data, 2017, 4(2):273-288. DOI:10.1109/TBDATA.2017.2753825.
[25] Glew A. MLP yes! ILP no. In Proc. the ASPLOS Wild and Crazy Idea Session, October 1998.
[26] Sorin D J, Pai V S, Adve S, Vernon M K, Wood D A. Analytic evaluation of shared-memory systems with ILP processors. In Proc. the 25th Annual International Symposium on Computer Architecture, June 1998, pp.380-391. DOI:10.1109/ISCA.1998.694797.
[27] Gray J, Shenoy P. Rules of thumb in data engineering. In Proc. the 16th International Conference on Data Engineering, March 2000, pp.3-10. DOI:10.1109/ICDE.2000.839382.
[28] Williams S, Waterman A, Patterson D. Roofline:An insightful visual performance model for multicore architectures. Commun. ACM, 2009, 52(4):65-76. DOI:10.1145/1498765.1498785.
[29] Zhu M F, Xiao L M, Ruan L, Hao Q F. DeepComp:Towards a balanced system design for high performance computer systems. Front. Comput. Sci. China, 2010, 4(4):475-479. DOI:10.1007/s11704-010-0150-z.
[1] Songjie Niu, Shimin Chen. TransGPerf:利用迁移学习建模分布式图计算性能[J]. 计算机科学技术学报, 2021, 36(4): 778-791.
[2] Lan Huang, Da-Lin Li, Kang-Ping Wang, Teng Gao, Adriano Tavares. 一个关于高级综合工具性能优化的综述[J]. 计算机科学技术学报, 2020, 35(3): 697-720.
[3] Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. 神威存储系统面向应用I/O性能提升的优化介绍[J]. 计算机科学技术学报, 2020, 35(1): 47-60.
[4] Min Li, Chao Yang, Qiao Sun Wen-Jing Ma, Wen-Long Cao, Yu-Long Ao. 申威26010处理器上k-means算法的高性能并行[J]. 计算机科学技术学报, 2019, 34(1): 77-93.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] 冯玉琳;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[5] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[6] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[7] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[8] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[9] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[10] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: