|
计算机科学技术学报 ›› 2023,Vol. 38 ›› Issue (1): 64-79.doi: 10.1007/s11390-022-2911-1
所属专题: 综述; Computer Architecture and Systems
(孙贤和), Fellow, IEEE, and (鲁潇阳), Member, IEEE
随着大数据应用的激增和内存墙问题的恶化,内存系统已取代计算单元成为了计算机研究的主要关切点。三十多年前,内存制约加速比模型是第一个提出数据的存储是计算性能瓶颈的模型。内存制约加速比模型提供了通用的加速比计算方法并揭示了计算加速比将受限于存储容量的规律。内存制约加速比模型一经提出就被业界采纳,并立即被收入多本并行计算机和高级计算机结构的教科书中,成为计算机学科研究生的必修内容。其中就包括Kai Hwang教授的《Scalable Parallel Computing: Technology, Architecture, Programming》一书。在此书中,内存制约加速比模型被称为孙-倪定律 (Sun-Ni’s Law) , 与阿姆达尔 (Amdahl) 定律和古斯塔夫森 (Gustafson) 定律并列为可扩展计算的著名三大定律。经过多年的发展,内存制约加速比模型的影响已经远远超出了并行计算的范围,进入了计算的根本。内存制约加速比模型促进了以数据为中心的计算概念,为研发下一代内存系统和优化工具提供了新见解,为解决“大数据”问题提供了关键思路。在这篇文章中,我们回顾了内存制约加速比模型的进展和影响,并讨论了其在大数据时代的作用和潜力。
<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. <i>ACM SIGARCH Computer Architecture News</i>, 1995, 23(1): 20-24. DOI: <a href="https://doi.org/10.1145/216585.216588">10.1145/216585.216588</a>. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Ni L M. Scalable problems and memory-bounded speedup. <i>Journal of Parallel and Distributed Computing</i>, 1993, 19(1): 27-37. DOI: <a href="https://doi.org/10.1006/jpdc.1993.1087">10.1006/jpdc.1993.1087</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Ni L M. Another view on parallel speedup. In <i>Proc</i>.<i> the 1990 ACM/IEEE Conference on Supercomputing</i>, Nov. 1990, pp.324-333. DOI: <a href="https://doi.org/10.1109/SUPERC.1990.130037">10.1109/SUPERC.1990.130037</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In <i>Proc</i>.<i> the </i><i>Spring Joint Computer Conference</i>, Apr. 1967, pp.483-485. DOI: <a href="https://doi.org/10.1145/1465482.1465560">10.1145/1465482.1465560</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Gustafson J L. Reevaluating Amdahl’s law. <i>Communications of the ACM</i>, 1988, 31(5): 532-533. DOI: <a href="https://doi.org/10.1145/42411.42415">10.1145/42411.42415</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Bashe C J, Johnson L R, Palmer J H, Pugh E W. IBM’s Early Computers. MIT Press, 1986. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Chen Y. Reevaluating Amdahl’s law in the multicore era. <i>Journal of Parallel and Distributed Computing</i>, 2010, 70(2): 183-188. DOI: <a href="https://doi.org/10.1016/j.jpdc.2009.05.002">10.1016/j.jpdc.2009.05.002</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Pan C Y, Naeemi A. System-level optimization and benchmarking of graphene PN junction logic system based on empirical CPI model. In <i>Proc. the IEEE International Conference on IC Design & Technology</i>, Jun. 2012. DOI: 10.<a href="https://doi.org/1109/ICICDT.2012.6232850">1109/ICICDT.2012.6232850</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kogge P M. Hardware Evolution Trends of Extreme Scale Computing. Technical Reprt, University of Notre Dame, South Bend, 2011. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (6th edition). Elsevier, 2017. </div> </td> </tr> <tr class="document-box" id="b11"> <td valign="top" class="td1"> [11] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liu Y H, Sun X H. LPM: A systematic methodology for concurrent data access pattern optimization from a matching perspective. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 2019, 30(11): 2478-2493. DOI: <a href="https://doi.org/10.1109/TPDS.2019.2912573">10.1109/TPDS.2019.2912573</a>. </div> </td> </tr> <tr class="document-box" id="b12"> <td valign="top" class="td1"> [12] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lo Y J, Williams S, Straalen B V, Ligocki T J, Cordery M J, Wright N J, Hall M W, Oliker L. Roofline model toolkit: A practical tool for architectural and program analysis. In <i>Proc</i>.<i> the 5th International Workshop on Performance Modeling</i>,<i> Benchmarking and Simulation of High Performance Computer Systems</i>, Nov. 2014, pp.129-148. DOI: <a href="https://doi.org/10.1007/978-3-319-17248-4_7">10.1007/978-3-319-17248-4_7</a>. </div> </td> </tr> <tr class="document-box" id="b13"> <td valign="top" class="td1"> [13] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Saini S, Chang J, Jin H Q. Performance evaluation of the Intel sandy bridge based NASA Pleiades using scientific and engineering applications. In <i>Proc</i>.<i> the 4th International Workshop on Performance Modeling</i>,<i> Benchmarking and Simulation of High Performance Computer Systems</i>, Nov. 2013, pp.25-51. DOI: <a href="https://doi.org/10.1007/978-3-319-10214-6_2">10.1007/978-3-319-10214-6_2</a>. </div> </td> </tr> <tr class="document-box" id="b14"> <td valign="top" class="td1"> [14] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Gustafson J L. Toward a better parallel performance metric. <i>Parallel Computing</i>, 1991, 17(10/11): 1093-1109. DOI: <a href="https://doi.org/10.1016/S0167-8191(05)80028-6">10.1016/S0167-8191(05)80028-6</a>. </div> </td> </tr> <tr class="document-box" id="b15"> <td valign="top" class="td1"> [15] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kumar V, Singh V. Scalability of parallel algorithms for the all-pairs shortest-path problem. <i>Journal of Parallel and Distributed Computing</i>, 1991, 13(2): 124-138. DOI: <a href="https://doi.org/10.1016/0743-7315(91)90083-L">10.1016/0743-7315(91)90083-L</a>. </div> </td> </tr> <tr class="document-box" id="b16"> <td valign="top" class="td1"> [16] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kumar V, Grama A, Gupta A, Karypis G. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994. </div> </td> </tr> <tr class="document-box" id="b17"> <td valign="top" class="td1"> [17] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Chen Y, Wu M. Scalability of heterogeneous computing. In <i>Proc. the International Conference on Parallel Processing (ICPP’05)</i>, Jun. 2005, pp.557-564. DOI: <a href="https://doi.org/10.1109/ICPP.2005.69">10.1109/ICPP.2005.69</a>. </div> </td> </tr> <tr class="document-box" id="b18"> <td valign="top" class="td1"> [18] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Rover D T. Scalability of parallel algorithm-machine combinations. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 1994, 5(6): 599-613. DOI: <a href="https://doi.org/10.1109/71.285606">10.1109/71.285606</a>. </div> </td> </tr> <tr class="document-box" id="b19"> <td valign="top" class="td1"> [19] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Pantano M, Fahringer T. Integrated range comparison for data-parallel compilation systems. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 1999, 10(5): 448-458. DOI: <a href="https://doi.org/10.1109/71.770134">10.1109/71.770134</a>. </div> </td> </tr> <tr class="document-box" id="b20"> <td valign="top" class="td1"> [20] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H. Scalability versus ution time in scalable systems. <i>Journal of Parallel and Distributed Computing</i>, 2002, 62(2): 173-192. DOI: <a href="https://doi.org/10.1006/jpdc.2001.1773">10.1006/jpdc.2001.1773</a>. </div> </td> </tr> <tr class="document-box" id="b21"> <td valign="top" class="td1"> [21] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hill M D, Marty M R. Amdahl’s law in the multicore era. <i>Computer</i>, 2008, 41(7): 33-38. DOI: <a href="https://doi.org/10.1109/MC.2008.209">10.1109/MC.2008.209</a>. </div> </td> </tr> <tr class="document-box" id="b22"> <td valign="top" class="td1"> [22] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Chen Y, Byna S. Scalable computing in the multicore era. In <i>Proc</i>.<i> the 2008 International Symposium on Parallel Architectures</i>,<i> Algorithms and Programming</i>, Sept. 2008. </div> </td> </tr> <tr class="document-box" id="b23"> <td valign="top" class="td1"> [23] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Dwork C, Goldberg A, Naor M. On memory-bound functions for fighting spam. In <i>Proc</i>.<i> the 23rd Annual International Cryptology Conference</i>, Aug. 2003, pp.426-444. DOI: <a href="https://doi.org/10.1007/978-3-540-45146-4_25">10.1007/978-3-540-45146-4_25</a>. </div> </td> </tr> <tr class="document-box" id="b24"> <td valign="top" class="td1"> [24] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Abadi M, Burrows M, Manasse M, Wobber T. Moderately hard, memory-bound functions. <i>ACM Trans</i>.<i> Internet Technology</i>, 2005, 5(2): 299-327. DOI: <a href="https://doi.org/10.1145/1064340.1064341">10.1145/1064340.1064341</a>. </div> </td> </tr> <tr class="document-box" id="b25"> <td valign="top" class="td1"> [25] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hart P E, Nilsson N J, Raphael B. A formal basis for the heuristic determination of minimum cost paths. <i>IEEE Trans</i>.<i> Systems Science and Cybernetics</i>, 1968, 4(2): 100-107. DOI: <a href="https://doi.org/10.1109/TSSC.1968.300136">10.1109/TSSC.1968.300136</a>. </div> </td> </tr> <tr class="document-box" id="b26"> <td valign="top" class="td1"> [26] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Korf R E. Depth-first iterative-deepening: An optimal admissible tree search. <i>Artificial Intelligence</i>, 1985, 27(1): 97-109. DOI: <a href="https://doi.org/10.1016/0004-3702(85)90084-0">10.1016/0004-3702(85)90084-0</a>. </div> </td> </tr> <tr class="document-box" id="b27"> <td valign="top" class="td1"> [27] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Korf R E, Reid M, Edelkamp S. Time complexity of iterative-deepening-A*. <i>Artificial Intelligence</i>, 2001, 129(1/2): 199-218. DOI: <a href="https://doi.org/10.1016/S0004-3702(01)00094-7">10.1016/S0004-3702(01)00094-7</a>. </div> </td> </tr> <tr class="document-box" id="b28"> <td valign="top" class="td1"> [28] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Russell S. Efficient memory-bounded search methods. In<i> Proc</i>.<i> the 10th European Conference on Artificial intelligence</i>, Aug. 1992. </div> </td> </tr> <tr class="document-box" id="b29"> <td valign="top" class="td1"> [29] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lovinger J, Zhang X Q. Enhanced simplified memory-bounded a star (SMA*+). In<i> Proc</i>.<i> the 3rd Global Conference on Artificial Intelligence</i>, Oct. 2017, pp.202-212. DOI: <a href="https://doi.org/10.29007/v7zc">10.29007/v7zc</a>. </div> </td> </tr> <tr class="document-box" id="b30"> <td valign="top" class="td1"> [30] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Seuken S, Zilberstein S. Memory-bounded dynamic programming for DEC-POMDPs. In <i>Proc</i>.<i> the 20th International Joint Conference on Artifical Intelligence</i>, Jan. 2007, pp.2009-2015. </div> </td> </tr> <tr class="document-box" id="b31"> <td valign="top" class="td1"> [31] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Seuken S, Zilberstein S. Improved memory-bounded dynamic programming for decentralized pomdps. arXiv: 1206.5295, 2012. <a href="https://arxiv.org/abs/1206.5295,20Dec.202022">https://arxiv.org/abs/1206.5295, Dec. 2022</a>. </div> </td> </tr> <tr class="document-box" id="b32"> <td valign="top" class="td1"> [32] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Chen Z Y, Zhang W X, Deng Y C, Chen D D, Li Q. RMB-DPOP: Refining MB-DPOP by reducing redundant inferences. arXiv: 2002.10641, 2020. <a href="https://doi.org/10.48550/arXiv.2002">https://doi.org/10.48550/arXiv.2002</a>.10641, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b33"> <td valign="top" class="td1"> [33] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Brito I, Meseguer P. Improving DPOP with function filtering. In <i>Proc</i>.<i> the 9th</i> <i>International Conference on Autonomous Agents and Multiagent Systems</i>:<i> Volume 1</i>, May 2010, pp.141-148. </div> </td> </tr> <tr class="document-box" id="b34"> <td valign="top" class="td1"> [34] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Petcu A, Faltings B. ODPOP: An algorithm for open/distributed constraint optimization. In <i>Proc</i>.<i> the 21st National Conference on Artificial Intelligence</i>, Jul. 2006, pp.703-708. </div> </td> </tr> <tr class="document-box" id="b35"> <td valign="top" class="td1"> [35] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Petcu A, Faltings B. A hybrid of inference and local search for distributed combinatorial optimization. In <i>Proc. the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’07)</i>, Nov. 2007, pp.342-348. DOI: <a href="https://doi.org/10.1109/IAT.2007.12">10.1109/IAT.2007.12</a>. </div> </td> </tr> <tr class="document-box" id="b36"> <td valign="top" class="td1"> [36] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Petcu A, Faltings B. MB-DPOP: A new memory-bounded algorithm for distributed optimization. In <i>Proc</i>.<i> the 20th International Joint Conference on Artifical Intelligence</i>, Jan. 2007, pp.1452-1457. </div> </td> </tr> <tr class="document-box" id="b37"> <td valign="top" class="td1"> [37] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Williams S W. Auto-tuning performance on multicore computers [Ph.D. Thesis]. University of California, Berkeley, 2008. </div> </td> </tr> <tr class="document-box" id="b38"> <td valign="top" class="td1"> [38] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. <i>Communications of the ACM</i>, 2009, 52(4): 65-76. DOI: 10.<a href="https://doi.org/1145/1498765.1498785">1145/1498765.1498785</a>. </div> </td> </tr> <tr class="document-box" id="b39"> <td valign="top" class="td1"> [39] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lu X Y, Wang R J, Sun X H. APAC: An accurate and adaptive prefetch framework with concurrent memory access analysis. In <i>Proc. the 38th IEEE International Conference on Computer Design (ICCD)</i>, Oct. 2020, pp.222-229. DOI: <a href="https://doi.org/10.1109/ICCD50377.2020.00048">10.1109/ICCD50377.2020.00048</a>. </div> </td> </tr> <tr class="document-box" id="b40"> <td valign="top" class="td1"> [40] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lu X Y, Wang R J, Sun X H. Premier: A concurrency-aware pseudo-partitioning framework for shared last-level cache. In <i>Proc. the 39th IEEE International Conference on Computer Design (ICCD)</i>, Oct. 2021, pp.391-394. DOI: <a href="https://doi.org/10.1109/ICCD53106.2021.00068">10.1109/ICCD53106.2021.00068</a>. </div> </td> </tr> <tr class="document-box" id="b41"> <td valign="top" class="td1"> [41] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liu J, Espina P, Sun X H. A study on modeling and optimization of memory systems. <i>Journal of Computer Science and Technology</i>, 2021, 36(1): 71-89. DOI: <a href="https://doi.org/10.1007/s11390-021-0771-8">10.1007/s11390-021-0771-8</a>. </div> </td> </tr> <tr class="document-box" id="b42"> <td valign="top" class="td1"> [42] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Glew A. MLP yes! ILP no. In <i>Proc. ASPLOS Wild and Crazy Idea Session</i>, Oct. 1998. </div> </td> </tr> <tr class="document-box" id="b43"> <td valign="top" class="td1"> [43] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In <i>Proc. the </i><i>33rd International Symposium on Computer Architecture (ISCA’06)</i>, Jun. 2006, pp.167-178. DOI: <a href="https://doi.org/10.1109/ISCA.2006.5">10.1109/ISCA.2006.5</a>. </div> </td> </tr> <tr class="document-box" id="b44"> <td valign="top" class="td1"> [44] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Wang D W. Concurrent average memory access time. <i>Computer</i>, 2014, 47(5): 74-80. DOI: <a href="https://doi.org/10.1109/MC.2013.227">10.1109/MC.2013.227</a>. </div> </td> </tr> <tr class="document-box" id="b45"> <td valign="top" class="td1"> [45] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Najafi H, Lu X, Liu J, Sun X H. A generalized model for modern hierarchical memory system. In <i>Proc. Winter Simulation Conference (WSC)</i>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b46"> <td valign="top" class="td1"> [46] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lu X, Wang R, Sun X H. CARE: A concurrency-aware enhanced lightweight cache management framework. In <i>Proc</i>.<i> the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA)</i>, Feb. 25–Mar. 1, 2023. </div> </td> </tr> <tr class="document-box" id="b47"> <td valign="top" class="td1"> [47] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Yan L, Zhang M Z, Wang R J, Chen X M, Zou X Q, Lu X Y, Han Y H, Sun X H. CoPIM: A concurrency-aware PIM workload offloading architecture for graph applications. In <i>Proc. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)</i>, Jul. 2021. DOI: <a href="https://doi.org/10.1109/ISLPED52811.2021.9502483">10.1109/ISLPED52811.2021.9502483</a>. </div> </td> </tr> <tr class="document-box" id="b48"> <td valign="top" class="td1"> [48] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zhang N, Jiang C T, Sun X H, Song S L. Evaluating GPGPU memory performance through the C-AMAT model. In <i>Proc</i>.<i> the Workshop on Memory Centric Programming for HPC</i>, Nov. 2017, pp.35-39. DOI: <a href="https://doi.org/10.1145/3145617.3158214">10.1145/3145617.3158214</a>. </div> </td> </tr> <tr class="document-box" id="b49"> <td valign="top" class="td1"> [49] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kannan S, Gavrilovska A, Schwan K, Milojicic D, Talwar V. Using active NVRAM for I/O staging. In <i>Proc</i>.<i> the 2nd International Workshop on Petascal Data Analytics</i>:<i> Challenges and Opportunities</i>, Nov. 2011, pp.15-22. DOI: <a href="https://doi.org/10.1145/2110205.2110209">10.1145/2110205.2110209</a>. </div> </td> </tr> <tr class="document-box" id="b50"> <td valign="top" class="td1"> [50] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Caulfield A M, Grupp L M, Swanson S. Gordon: Using flash memory to build fast, power-efficient clusters for data-intensive applications. <i>ACM SIGPLAN Notices</i>, 2009, 44(3): 217-228. DOI: <a href="https://doi.org/10.1145/1508284.1508270">10.1145/1508284.1508270</a>. </div> </td> </tr> <tr class="document-box" id="b51"> <td valign="top" class="td1"> [51] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Reed D A, Dongarra J. Exascale computing and big data. <i>Communications of the ACM</i>, 2015, 58(7): 56-68. DOI: <a href="https://doi.org/10.1145/2699414">10.1145/2699414</a>. </div> </td> </tr> <tr class="document-box" id="b52"> <td valign="top" class="td1"> [52] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In<i> Proc</i>.<i> the 9th International Conference on High Performance Computing for Computational Science</i>, Jun. 2010. DOI: <a href="https://doi.org/10.1007/978-3-642-19328-6_1">10.1007/978-3-642-19328-6_1</a>. </div> </td> </tr> <tr class="document-box" id="b53"> <td valign="top" class="td1"> [53] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kougkas A, Devarajan H, Sun X H. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In <i>Proc</i>.<i> the 27th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2018, pp.219-230. DOI: <a href="https://doi.org/10.1145/3208040.3208059">10.1145/3208040.3208059</a>. </div> </td> </tr> <tr class="document-box" id="b54"> <td valign="top" class="td1"> [54] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kougkas A, Devarajan H, Sun X H. I/O acceleration via multi-tiered data buffering and prefetching. <i>Journal of Computer Science and Technology</i>, 2020, 35(1): 92-120. DOI: <a href="https://doi.org/10.1007/s11390-020-9781-1">10.1007/s11390-020-9781-1</a>. </div> </td> </tr> <tr class="document-box" id="b55"> <td valign="top" class="td1"> [55] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Tissenbaum M, Sheldon J, Abelson H. From computational thinking to computational action. <i>Communications of the ACM</i>, 2019, 62(3): 34-36. DOI: <a href="https://doi.org/10.1145/3265747">10.1145/3265747</a>. </div> </td> </tr> <tr class="document-box" id="b56"> <td valign="top" class="td1"> [56] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liu Y H, Sun X H, Wang Y, Bao Y G. HCDA: From computational thinking to a generalized thinking paradigm. <i>Communications of the ACM</i>, 2021, 64(5): 66-75. DOI: <a href="https://doi.org/10.1145/3418291">10.1145/3418291</a>. </div> </td> </tr> <tr class="document-box" id="b57"> <td valign="top" class="td1"> [57] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Owens J D, Houston M, Luebke D, Green S, Stone J E, Phillips J C. GPU computing. <i>Proceedings of the IEEE</i>, 2008, 96(5): 879-899. DOI: <a href="https://doi.org/10.1109/JPROC.2008.917757">10.1109/JPROC.2008.917757</a>. </div> </td> </tr> <tr class="document-box" id="b58"> <td valign="top" class="td1"> [58] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. <i>Communications of the ACM</i>, 2008, 51(1): 107-113. DOI: <a href="https://doi.org/10.1145/1327452.1327492">10.1145/1327452.1327492</a>. </div> </td> </tr> <tr class="document-box" id="b59"> <td valign="top" class="td1"> [59] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Momose H, Kaneko T, Asai T. Systems and circuits for AI chips and their trends. <i>Japanese Journal of Applied Physics</i>, 2020, 59(5): 050502. DOI: <a href="https://doi.org/10.35848/1347-4065/ab839f">10.35848/1347-4065/ab839f</a>. </div> </td> </tr> <tr class="document-box" id="b60"> <td valign="top" class="td1"> [60] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Singh G, Alser M, Cali D S, Diamantopoulos D, Gómez-Luna J, Corporaal H, Mutlu O. FPGA-based near-memory acceleration of modern data-intensive applications. <i>IEEE Micro</i>, 2021, 41(4): 39-48. DOI: <a href="https://doi.org/10.1109/MM.2021.3088396">10.1109/MM.2021.3088396</a>. </div> </td> </tr> <tr class="document-box" id="b61"> <td valign="top" class="td1"> [61] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Choi Y K, Santillana C, Shen Y J, Darwiche A, Cong J. FPGA acceleration of probabilistic sentential decision diagrams with high-level synthesis. <i>ACM Trans</i>.<i> Reconfigurable Technology and Systems</i>, 2022. DOI: <a href="https://doi.org/10.1145/3561514">10.1145/3561514</a>. </div> </td> </tr> <tr class="document-box" id="b62"> <td valign="top" class="td1"> [62] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ghose S, Boroumand A, Kim J S, Gómez-Luna J, Mutlu O. Processing-in-memory: A workload-driven perspective. <i>IBM Journal of Research and Development</i>, 2019, 63(6): Article No. 3. DOI: <a href="https://doi.org/10.1147/JRD.2019.2934048">10.1147/JRD.2019.2934048</a>. </div> </td> </tr> <tr class="document-box" id="b63"> <td valign="top" class="td1"> [63] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ghiasi N M, Park J, Mustafa H, Kim J, Olgun A, Gollwitzer A, Cali D S, Firtina C, Mao H Y, Alserr N A, Ausavarungnirun R, Vijaykumar N, Alser M, Mutlu O. GenStore: A high-performance in-storage processing system for genome sequence analysis. In <i>Proc</i>.<i> the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems</i>, Feb. 2022, pp.635-654. DOI: <a href="https://doi.org/10.1145/3503222.3507702">10.1145/3503222.3507702</a>. </div> </td> </tr> <tr class="document-box" id="b64"> <td valign="top" class="td1"> [64] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Mutlu O. Intelligent architectures for intelligent computing systems. In<i> Proc. the 2021 Design</i>,<i> Automation & Test in Europe Conference & Exhibition (DATE)</i>, Feb. 2021, pp.318-323. DOI: <a href="https://doi.org/10.23919/DATE51398.2021.9474073">10.23919/DATE51398.2021.9474073</a>. </div> </td> </tr> <tr class="document-box" id="b65"> <td valign="top" class="td1"> [65] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Liu Y H. Utilizing concurrency: A new theory for memory wall. In<i> Proc</i>.<i> the 29th International Workshop on Languages and Compilers for Parallel Computing</i>, Sept. 2016, pp.18-23. DOI: <a href="https://doi.org/10.1007/978-3-319-52709-3_2">10.1007/978-3-319-52709-3_2</a>. </div> </td> </tr> <tr class="document-box" id="b66"> <td valign="top" class="td1"> [66] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS: A distributed label-based I/O system. In <i>Proc</i>.<i> the 28th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2019, pp.13-24. DOI: <a href="https://doi.org/10.1145/3307681.3325405">10.1145/3307681.3325405</a>. </div> </td> </tr> <tr class="document-box" id="b67"> <td valign="top" class="td1"> [67] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Logan L, Garcia J C, Lofstead J, Sun X H, Kougkas A. LabStor: A modular and extensible platform for developing high-performance, customized I/O stacks in userspace. In <i>Proc</i>.<i> the ACM/IEEE International Conference for High Performance Computing</i>,<i> Networking</i>,<i> Storage and Analysis (SC’22)</i>, Nov. 2022, pp.309-323. </div> </td> </tr> <tr class="document-box" id="b68"> <td valign="top" class="td1"> [68] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hwang K, Xu Z W. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1998. </div> </td> </tr> <tr class="document-box" id="b69"> <td valign="top" class="td1"> [69] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993. </div> </td> </tr> </tbody> </table> |
[1] | Jing Li, Lei Liu, Yuan Wu, Xiang-Hua Liu, Yi Gao, Xiao-Bing Feng, Cheng-Yong Wu. 基于制导的GPU共享内存相关优化[J]. , 2016, 31(2): 235-252. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |