|
Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (1): 71-89.doi: 10.1007/s11390-021-0771-8
Special Issue: Computer Architecture and Systems
• Special Section on Memory-Centric System Research for High-Performance Computing • Previous Articles Next Articles
Jason Liu1, Pedro Espina1, and Xian-He Sun2, Fellow, IEEE
[1] Wulf W A, McKee S A. Hitting the memory wall:Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1):20-24. DOI:10.1145/216585.216588. [2] Denning P J. The working set model for program behavior. In Proc. the 1st ACM Symposium on Operating System Principles, October 1967, Article No. 15. DOI:10.1145/357980.357997. [3] Denning P J. The locality principle. In Communication Networks and Computer Systems:A Tribute to Professor Erol Gelenbe, Barria G A (ed.), London, Imperial College Press, 2006, pp.43-67. [4] Chou Y, Fahs B, Abraham S G. Microarchitecture optimizations for exploiting memory-level parallelism. In Proc. the 31st Annual International Symposium on Computer Architecture, June 2004, pp.76-87. DOI:10.1109/ISCA.2004.1310765. [5] Sun X H, Wang D W. Concurrent average memory access time. Computer, 2014, 47(5):74-80. DOI:10.1109/MC.2013.227. [6] Wang D W, Sun X H. APC:A novel memory metric and measurement methodology for modern memory systems. IEEE Transactions on Computers, 2014, 63(7):1626-1639. DOI:10.1109/TC.2013.38. [7] Liu Y, Sun X. LPM:A systematic methodology for concurrent data access pattern optimization from a matching perspective. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(11):2478-2493. DOI:10.1109/TPDS.2019.2912573. [8] Hennessy J L, Patterson D A. Computer Architecture:A Quantitative Approach (5th edition). Morgan Kaufmann, 2011. [9] Tuck J, Ceze L, Torrellas J. Scalable cache miss handling for high memory-level parallelism. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp.409-422. DOI:10.1109/MICRO.2006.44. [10] Lim K, Turner Y, Santos J R, AuYoung A, Chang J, Ranganathan P, Wenisch T F. System-level implications of disaggregated memory. In Proc. the 2012 IEEE International Symposium on High-Performance Comp Architecture, Feb. 2012, pp.189-200. DOI:10.1109/HPCA.2012.6168955. [11] Gao P X, Narayan A, Karandikar S, Carreira J, Han S, Agarwal R, Ratnasamy S, Shenker S. Network requirements for resource disaggregation. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, Nov. 2016, pp.249-264. DOI:10.5555/3026877.3026897. [12] Zhang N, Toonen B, Sun X H, Allcock B. Performance modeling and evaluation of a production disaggregated memory system. In Proc. the 2020 International Symposium on Memory Systems, Sept. 28-Oct. 2. 2020. [13] Zhang N, Jiang C T, Sun X H, Song S. Evaluating GPGPU memory performance through the C-AMAT model. In Proc. the Workshop on Memory Centric Programming for HPC, Nov. 2017, pp.35-39. DOI:10.1145/3145617.3158214. [14] Sun X H, Ni L M. Another view on parallel speedup. In Proc. the 1990 ACM/IEEE Conference on Supercomputing, November 1990, pp.324-333. DOI:10.1109/SUPERC.1990.130037. [15] Mattson R L, Gecsei J, Slutz D R, Traiger I L. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970, 9(2):78-117. DOI:10.1147/sj.92.0078. [16] Weinberg J, McCracken M O, Strohmaier E, Snavely A. Quantifying locality in the memory access patterns of HPC applications. In Proc. the 2005 ACM/IEEE Conference on Supercomputing, November 2005, Article No. 50. DOI:10.1109/SC.2005.59. [17] Berg E, Hagersten E. Fast data-locality profiling of native execution. In Proc. the International Conference on Measurements and Modeling of Computer Systems, June 2005, pp.169-180. DOI:10.1145/1071690.1064232. [18] Gu X M, Christopher I, Bai T X, Zhang C L, Ding C. A component model of spatial locality. In Proc. the 8th International Symposium on Memory Management, June 2009, pp.99-108. DOI:10.1145/1542431.1542446. [19] Anghel A, Dittmann G, Jongerius R, Luijten R. Spatiotemporal locality characterization. In Proc. the 1st Workshop on Near Data Processing, December 2013. [20] Ding C, Xiang X Y. A higher order theory of locality. In Proc. the 2012 ACM SIGPLAN Workshop on Memory System Performance Correctness, June 2012, pp.68-69. DOI:10.1145/2247684.2247697. [21] Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2003, pp.245-257. DOI:10.1145/781131.781159. [22] Jiang Y L, Zhang E Z, Tian K, Shen X P. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proc. the 19th International Conference on Compiler Construction, March 2010, pp.264-282. DOI:10.1007/978- 3-642-11970-515. [23] Gupta S, Xiang P, Yang Y, Zhou H Y. Locality principle revisited:A probability-based quantitative approach. Journal of Parallel and Distributed Computing, 2013, 73(7):1011- 1027. DOI:10.1016/j.jpdc.2013.01.010. [24] Liu Y H, Sun X H. CaL:Extending data locality to consider concurrency for performance optimization. IEEE Transactions on Big Data, 2017, 4(2):273-288. DOI:10.1109/TBDATA.2017.2753825. [25] Glew A. MLP yes! ILP no. In Proc. the ASPLOS Wild and Crazy Idea Session, October 1998. [26] Sorin D J, Pai V S, Adve S, Vernon M K, Wood D A. Analytic evaluation of shared-memory systems with ILP processors. In Proc. the 25th Annual International Symposium on Computer Architecture, June 1998, pp.380-391. DOI:10.1109/ISCA.1998.694797. [27] Gray J, Shenoy P. Rules of thumb in data engineering. In Proc. the 16th International Conference on Data Engineering, March 2000, pp.3-10. DOI:10.1109/ICDE.2000.839382. [28] Williams S, Waterman A, Patterson D. Roofline:An insightful visual performance model for multicore architectures. Commun. ACM, 2009, 52(4):65-76. DOI:10.1145/1498765.1498785. [29] Zhu M F, Xiao L M, Ruan L, Hao Q F. DeepComp:Towards a balanced system design for high performance computer systems. Front. Comput. Sci. China, 2010, 4(4):475-479. DOI:10.1007/s11704-010-0150-z. |
[1] | Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance [J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791. |
[2] | Hai-Kun Liu, Di Chen, Hai Jin, Xiao-Fei Liao, Binsheng He, Kan Hu, Yu Zhang. A Survey of Non-Volatile Main Memory Technologies: State-of-the-Arts, Practices, and Future Directions [J]. Journal of Computer Science and Technology, 2021, 36(1): 4-32. |
[3] | Lan Huang, Da-Lin Li, Kang-Ping Wang, Teng Gao, Adriano Tavares. A Survey on Performance Optimization of High-Level Synthesis Tools [J]. Journal of Computer Science and Technology, 2020, 35(3): 697-720. |
[4] | Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance [J]. Journal of Computer Science and Technology, 2020, 35(1): 47-60. |
[5] | Anthony Kougkas, Hariharan Devarajan, Xian-He Sun. I/O Acceleration via Multi-Tiered Data Buffering and Prefetching [J]. Journal of Computer Science and Technology, 2020, 35(1): 92-120. |
[6] | Min Li, Chao Yang, Qiao Sun Wen-Jing Ma, Wen-Long Cao, Yu-Long Ao. Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight [J]. Journal of Computer Science and Technology, 2019, 34(1): 77-93. |
[7] | Mei-Ying Bian, Su-Kyung Yoon, Jeong-Geun Kim, Sangjae Nam, Shin-Dug Kim. A Unified Buffering Management with Set Divisible Cache for PCM Main Memory [J]. , 2016, 31(1): 137-146. |
[8] | Yu-Hang Liu, Xian-He Sun. Reevaluating Data Stall Time with the Consideration of Data Access Concurrency [J]. , 2015, 30(2): 227-245. |
[9] | Tao Jiang, Rui Hou, Jian-Bo Dong, Lin Chai, Sally A. McKee, Bin Tian, Li-Xin Zhang, Ning-Hui Sun. Adapting Memory Hierarchies for Emerging Datacenter Interconnects [J]. , 2015, 30(1): 97-109. |
[10] | Surendra Byna, Yong Chen, and Xian-He Sun. Taxonomy of Data Prefetching for Multicore Processors [J]. , 2009, 24(3): 405-417. |
|