Journal of Computer Science and Technology ›› 2023, Vol. 38 ›› Issue (1): 64-79.doi: 10.1007/s11390-022-2911-1

Special Issue: Surveys; Computer Architecture and Systems

• Special Issue in Honor of Professor Kai Hwang’s 80th Birthday • Previous Articles     Next Articles

The Memory-Bounded Speedup Model and Its Impacts in Computing

Xian-He Sun (孙贤和), Fellow, IEEE, and Xiaoyang Lu (鲁潇阳), Member, IEEE        

  1. Department of Computer Science, Illinois Institute of Technology, Chicago 60616, U.S.A.
  • Received:2022-10-17 Revised:2022-11-12 Accepted:2022-12-01 Online:2023-02-28 Published:2023-02-28
  • Contact: Xian-He Sun E-mail:sun@iit.edu
  • About author:Xian-He Sun is a University Distinguished Professor and the Ron Hochsprung Endowed Chair of the Department of Computer Science at the Illinois Institute of Technology (Illinois Tech), Chicago. Before joining Illinois Tech, he worked at DoE Ames National Laboratory, at ICASE, NASA Langley Research Center, at Louisiana State University, Baton Rouge, and was an ASEE Fellow at Navy Research Laboratories. Dr. Sun is an IEEE Fellow and is known for his memory-bounded speedup model, also called Sun-Ni's Law, for scalable computing. His research interests include high-performance computing, memory and I/O systems, and performance evaluation and optimization. He has over 300 publications, six patents in these areas, and is currently leading multiple federal-funded large software development projects in HPC I/O systems. Dr. Sun is the Editor-in-Chief of IEEE Transactions on Parallel and Distributed Systems, and a former chair of the Computer Science Department at Illinois Tech, Chicago. He received the Golden Core Award from IEEE CS Society in 2017, the Overseas Outstanding Contributions Award from CCF in 2018, the ACM Karsten Schwan Best Paper Award from ACM HPDC in 2019, the Ron Hocksprung Endowed Chairship from Illinois Tech in 2020, the First Prize Best Paper Award from ACM/IEEE CCGrid in 2021, and the CSE Distinguished Alumni Award from the Michigan State University in 2022. More information about Dr. Sun can be found at his website: www.cs.iit.edu/~sun/.
  • Supported by:
    This work is supported in part by the U.S. National Science Foundation under Grant Nos. CCF-2029014 and CCF-2008907.

With the surge of big data applications and the worsening of the memory-wall problem, the memory system, instead of the computing unit, becomes the commonly recognized major concern of computing. However, this "memory-centric" common understanding has a humble beginning. More than three decades ago, the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation. The memory-bounded model was well received even by then. It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990's as a must-know for scalable computing. These include Prof. Kai Hwang's book "Scalable Parallel Computing" in which he introduced the memory-bounded speedup model as the Sun-Ni's law, parallel with the Amdahl's and the Gustafson's law. Through the years, the impacts of this model have grown far beyond parallel processing and into the fundamental of computing. In this article, we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue, to stimulate new solutions for big data applications, and to promote data-centric thinking and rethinking.

Key words: memory-bounded speedup; scalable computing; memory-wall; performance modeling and optimization; data-centric design ;

<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;"> <tbody> <tr class="document-box" id="b1"> <td valign="top" class="td1"> [1] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. <i>ACM SIGARCH Computer Architecture News</i>, 1995, 23(1): 20-24. DOI: <a href="https://doi.org/10.1145/216585.216588">10.1145/216585.216588</a>. </div> </td> </tr> <tr class="document-box" id="b2"> <td valign="top" class="td1"> [2] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Ni L M. Scalable problems and memory-bounded speedup. <i>Journal of Parallel and Distributed Computing</i>, 1993, 19(1): 27-37. DOI: <a href="https://doi.org/10.1006/jpdc.1993.1087">10.1006/jpdc.1993.1087</a>. </div> </td> </tr> <tr class="document-box" id="b3"> <td valign="top" class="td1"> [3] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Ni L M. Another view on parallel speedup. In <i>Proc</i>.<i> the 1990 ACM/IEEE Conference on Supercomputing</i>, Nov. 1990, pp.324-333. DOI: <a href="https://doi.org/10.1109/SUPERC.1990.130037">10.1109/SUPERC.1990.130037</a>. </div> </td> </tr> <tr class="document-box" id="b4"> <td valign="top" class="td1"> [4] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In <i>Proc</i>.<i> the </i><i>Spring Joint Computer Conference</i>, Apr. 1967, pp.483-485. DOI: <a href="https://doi.org/10.1145/1465482.1465560">10.1145/1465482.1465560</a>. </div> </td> </tr> <tr class="document-box" id="b5"> <td valign="top" class="td1"> [5] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Gustafson J L. Reevaluating Amdahl’s law. <i>Communications of the ACM</i>, 1988, 31(5): 532-533. DOI: <a href="https://doi.org/10.1145/42411.42415">10.1145/42411.42415</a>. </div> </td> </tr> <tr class="document-box" id="b6"> <td valign="top" class="td1"> [6] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Bashe C J, Johnson L R, Palmer J H, Pugh E W. IBM’s Early Computers. MIT Press, 1986. </div> </td> </tr> <tr class="document-box" id="b7"> <td valign="top" class="td1"> [7] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Chen Y. Reevaluating Amdahl’s law in the multicore era. <i>Journal of Parallel and Distributed Computing</i>, 2010, 70(2): 183-188. DOI: <a href="https://doi.org/10.1016/j.jpdc.2009.05.002">10.1016/j.jpdc.2009.05.002</a>. </div> </td> </tr> <tr class="document-box" id="b8"> <td valign="top" class="td1"> [8] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Pan C Y, Naeemi A. System-level optimization and benchmarking of graphene PN junction logic system based on empirical CPI model. In <i>Proc. the IEEE International Conference on IC Design & Technology</i>, Jun. 2012. DOI: 10.<a href="https://doi.org/1109/ICICDT.2012.6232850">1109/ICICDT.2012.6232850</a>. </div> </td> </tr> <tr class="document-box" id="b9"> <td valign="top" class="td1"> [9] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kogge P M. Hardware Evolution Trends of Extreme Scale Computing. Technical Reprt, University of Notre Dame, South Bend, 2011. </div> </td> </tr> <tr class="document-box" id="b10"> <td valign="top" class="td1"> [10] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (6th edition). Elsevier, 2017. </div> </td> </tr> <tr class="document-box" id="b11"> <td valign="top" class="td1"> [11] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liu Y H, Sun X H. LPM: A systematic methodology for concurrent data access pattern optimization from a matching perspective. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 2019, 30(11): 2478-2493. DOI: <a href="https://doi.org/10.1109/TPDS.2019.2912573">10.1109/TPDS.2019.2912573</a>. </div> </td> </tr> <tr class="document-box" id="b12"> <td valign="top" class="td1"> [12] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lo Y J, Williams S, Straalen B V, Ligocki T J, Cordery M J, Wright N J, Hall M W, Oliker L. Roofline model toolkit: A practical tool for architectural and program analysis. In <i>Proc</i>.<i> the 5th International Workshop on Performance Modeling</i>,<i> Benchmarking and Simulation of High Performance Computer Systems</i>, Nov. 2014, pp.129-148. DOI: <a href="https://doi.org/10.1007/978-3-319-17248-4_7">10.1007/978-3-319-17248-4_7</a>. </div> </td> </tr> <tr class="document-box" id="b13"> <td valign="top" class="td1"> [13] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Saini S, Chang J, Jin H Q. Performance evaluation of the Intel sandy bridge based NASA Pleiades using scientific and engineering applications. In <i>Proc</i>.<i> the 4th International Workshop on Performance Modeling</i>,<i> Benchmarking and Simulation of High Performance Computer Systems</i>, Nov. 2013, pp.25-51. DOI: <a href="https://doi.org/10.1007/978-3-319-10214-6_2">10.1007/978-3-319-10214-6_2</a>. </div> </td> </tr> <tr class="document-box" id="b14"> <td valign="top" class="td1"> [14] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Gustafson J L. Toward a better parallel performance metric. <i>Parallel Computing</i>, 1991, 17(10/11): 1093-1109. DOI: <a href="https://doi.org/10.1016/S0167-8191(05)80028-6">10.1016/S0167-8191(05)80028-6</a>. </div> </td> </tr> <tr class="document-box" id="b15"> <td valign="top" class="td1"> [15] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kumar V, Singh V. Scalability of parallel algorithms for the all-pairs shortest-path problem. <i>Journal of Parallel and Distributed Computing</i>, 1991, 13(2): 124-138. DOI: <a href="https://doi.org/10.1016/0743-7315(91)90083-L">10.1016/0743-7315(91)90083-L</a>. </div> </td> </tr> <tr class="document-box" id="b16"> <td valign="top" class="td1"> [16] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kumar V, Grama A, Gupta A, Karypis G. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994. </div> </td> </tr> <tr class="document-box" id="b17"> <td valign="top" class="td1"> [17] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Chen Y, Wu M. Scalability of heterogeneous computing. In <i>Proc. the International Conference on Parallel Processing (ICPP’05)</i>, Jun. 2005, pp.557-564. DOI: <a href="https://doi.org/10.1109/ICPP.2005.69">10.1109/ICPP.2005.69</a>. </div> </td> </tr> <tr class="document-box" id="b18"> <td valign="top" class="td1"> [18] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Rover D T. Scalability of parallel algorithm-machine combinations. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 1994, 5(6): 599-613. DOI: <a href="https://doi.org/10.1109/71.285606">10.1109/71.285606</a>. </div> </td> </tr> <tr class="document-box" id="b19"> <td valign="top" class="td1"> [19] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Pantano M, Fahringer T. Integrated range comparison for data-parallel compilation systems. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 1999, 10(5): 448-458. DOI: <a href="https://doi.org/10.1109/71.770134">10.1109/71.770134</a>. </div> </td> </tr> <tr class="document-box" id="b20"> <td valign="top" class="td1"> [20] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H. Scalability versus ution time in scalable systems. <i>Journal of Parallel and Distributed Computing</i>, 2002, 62(2): 173-192. DOI: <a href="https://doi.org/10.1006/jpdc.2001.1773">10.1006/jpdc.2001.1773</a>. </div> </td> </tr> <tr class="document-box" id="b21"> <td valign="top" class="td1"> [21] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hill M D, Marty M R. Amdahl’s law in the multicore era. <i>Computer</i>, 2008, 41(7): 33-38. DOI: <a href="https://doi.org/10.1109/MC.2008.209">10.1109/MC.2008.209</a>. </div> </td> </tr> <tr class="document-box" id="b22"> <td valign="top" class="td1"> [22] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Chen Y, Byna S. Scalable computing in the multicore era. In <i>Proc</i>.<i> the 2008 International Symposium on Parallel Architectures</i>,<i> Algorithms and Programming</i>, Sept. 2008. </div> </td> </tr> <tr class="document-box" id="b23"> <td valign="top" class="td1"> [23] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Dwork C, Goldberg A, Naor M. On memory-bound functions for fighting spam. In <i>Proc</i>.<i> the 23rd Annual International Cryptology Conference</i>, Aug. 2003, pp.426-444. DOI: <a href="https://doi.org/10.1007/978-3-540-45146-4_25">10.1007/978-3-540-45146-4_25</a>. </div> </td> </tr> <tr class="document-box" id="b24"> <td valign="top" class="td1"> [24] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Abadi M, Burrows M, Manasse M, Wobber T. Moderately hard, memory-bound functions. <i>ACM Trans</i>.<i> Internet Technology</i>, 2005, 5(2): 299-327. DOI: <a href="https://doi.org/10.1145/1064340.1064341">10.1145/1064340.1064341</a>. </div> </td> </tr> <tr class="document-box" id="b25"> <td valign="top" class="td1"> [25] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hart P E, Nilsson N J, Raphael B. A formal basis for the heuristic determination of minimum cost paths. <i>IEEE Trans</i>.<i> Systems Science and Cybernetics</i>, 1968, 4(2): 100-107. DOI: <a href="https://doi.org/10.1109/TSSC.1968.300136">10.1109/TSSC.1968.300136</a>. </div> </td> </tr> <tr class="document-box" id="b26"> <td valign="top" class="td1"> [26] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Korf R E. Depth-first iterative-deepening: An optimal admissible tree search. <i>Artificial Intelligence</i>, 1985, 27(1): 97-109. DOI: <a href="https://doi.org/10.1016/0004-3702(85)90084-0">10.1016/0004-3702(85)90084-0</a>. </div> </td> </tr> <tr class="document-box" id="b27"> <td valign="top" class="td1"> [27] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Korf R E, Reid M, Edelkamp S. Time complexity of iterative-deepening-A*. <i>Artificial Intelligence</i>, 2001, 129(1/2): 199-218. DOI: <a href="https://doi.org/10.1016/S0004-3702(01)00094-7">10.1016/S0004-3702(01)00094-7</a>. </div> </td> </tr> <tr class="document-box" id="b28"> <td valign="top" class="td1"> [28] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Russell S. Efficient memory-bounded search methods. In<i> Proc</i>.<i> the 10th European Conference on Artificial intelligence</i>, Aug. 1992. </div> </td> </tr> <tr class="document-box" id="b29"> <td valign="top" class="td1"> [29] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lovinger J, Zhang X Q. Enhanced simplified memory-bounded a star (SMA*+). In<i> Proc</i>.<i> the 3rd Global Conference on Artificial Intelligence</i>, Oct. 2017, pp.202-212. DOI: <a href="https://doi.org/10.29007/v7zc">10.29007/v7zc</a>. </div> </td> </tr> <tr class="document-box" id="b30"> <td valign="top" class="td1"> [30] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Seuken S, Zilberstein S. Memory-bounded dynamic programming for DEC-POMDPs. In <i>Proc</i>.<i> the 20th International Joint Conference on Artifical Intelligence</i>, Jan. 2007, pp.2009-2015. </div> </td> </tr> <tr class="document-box" id="b31"> <td valign="top" class="td1"> [31] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Seuken S, Zilberstein S. Improved memory-bounded dynamic programming for decentralized pomdps. arXiv: 1206.5295, 2012. <a href="https://arxiv.org/abs/1206.5295,20Dec.202022">https://arxiv.org/abs/1206.5295, Dec. 2022</a>. </div> </td> </tr> <tr class="document-box" id="b32"> <td valign="top" class="td1"> [32] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Chen Z Y, Zhang W X, Deng Y C, Chen D D, Li Q. RMB-DPOP: Refining MB-DPOP by reducing redundant inferences. arXiv: 2002.10641, 2020. <a href="https://doi.org/10.48550/arXiv.2002">https://doi.org/10.48550/arXiv.2002</a>.10641, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b33"> <td valign="top" class="td1"> [33] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Brito I, Meseguer P. Improving DPOP with function filtering. In <i>Proc</i>.<i> the 9th</i> <i>International Conference on Autonomous Agents and Multiagent Systems</i>:<i> Volume 1</i>, May 2010, pp.141-148. </div> </td> </tr> <tr class="document-box" id="b34"> <td valign="top" class="td1"> [34] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Petcu A, Faltings B. ODPOP: An algorithm for open/distributed constraint optimization. In <i>Proc</i>.<i> the 21st National Conference on Artificial Intelligence</i>, Jul. 2006, pp.703-708. </div> </td> </tr> <tr class="document-box" id="b35"> <td valign="top" class="td1"> [35] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Petcu A, Faltings B. A hybrid of inference and local search for distributed combinatorial optimization. In <i>Proc. the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’07)</i>, Nov. 2007, pp.342-348. DOI: <a href="https://doi.org/10.1109/IAT.2007.12">10.1109/IAT.2007.12</a>. </div> </td> </tr> <tr class="document-box" id="b36"> <td valign="top" class="td1"> [36] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Petcu A, Faltings B. MB-DPOP: A new memory-bounded algorithm for distributed optimization. In <i>Proc</i>.<i> the 20th International Joint Conference on Artifical Intelligence</i>, Jan. 2007, pp.1452-1457. </div> </td> </tr> <tr class="document-box" id="b37"> <td valign="top" class="td1"> [37] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Williams S W. Auto-tuning performance on multicore computers [Ph.D. Thesis]. University of California, Berkeley, 2008. </div> </td> </tr> <tr class="document-box" id="b38"> <td valign="top" class="td1"> [38] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. <i>Communications of the ACM</i>, 2009, 52(4): 65-76. DOI: 10.<a href="https://doi.org/1145/1498765.1498785">1145/1498765.1498785</a>. </div> </td> </tr> <tr class="document-box" id="b39"> <td valign="top" class="td1"> [39] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lu X Y, Wang R J, Sun X H. APAC: An accurate and adaptive prefetch framework with concurrent memory access analysis. In <i>Proc. the 38th IEEE International Conference on Computer Design (ICCD)</i>, Oct. 2020, pp.222-229. DOI: <a href="https://doi.org/10.1109/ICCD50377.2020.00048">10.1109/ICCD50377.2020.00048</a>. </div> </td> </tr> <tr class="document-box" id="b40"> <td valign="top" class="td1"> [40] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lu X Y, Wang R J, Sun X H. Premier: A concurrency-aware pseudo-partitioning framework for shared last-level cache. In <i>Proc. the 39th IEEE International Conference on Computer Design (ICCD)</i>, Oct. 2021, pp.391-394. DOI: <a href="https://doi.org/10.1109/ICCD53106.2021.00068">10.1109/ICCD53106.2021.00068</a>. </div> </td> </tr> <tr class="document-box" id="b41"> <td valign="top" class="td1"> [41] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liu J, Espina P, Sun X H. A study on modeling and optimization of memory systems. <i>Journal of Computer Science and Technology</i>, 2021, 36(1): 71-89. DOI: <a href="https://doi.org/10.1007/s11390-021-0771-8">10.1007/s11390-021-0771-8</a>. </div> </td> </tr> <tr class="document-box" id="b42"> <td valign="top" class="td1"> [42] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Glew A. MLP yes! ILP no. In <i>Proc. ASPLOS Wild and Crazy Idea Session</i>, Oct. 1998. </div> </td> </tr> <tr class="document-box" id="b43"> <td valign="top" class="td1"> [43] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In <i>Proc. the </i><i>33rd International Symposium on Computer Architecture (ISCA’06)</i>, Jun. 2006, pp.167-178. DOI: <a href="https://doi.org/10.1109/ISCA.2006.5">10.1109/ISCA.2006.5</a>. </div> </td> </tr> <tr class="document-box" id="b44"> <td valign="top" class="td1"> [44] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Wang D W. Concurrent average memory access time. <i>Computer</i>, 2014, 47(5): 74-80. DOI: <a href="https://doi.org/10.1109/MC.2013.227">10.1109/MC.2013.227</a>. </div> </td> </tr> <tr class="document-box" id="b45"> <td valign="top" class="td1"> [45] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Najafi H, Lu X, Liu J, Sun X H. A generalized model for modern hierarchical memory system. In <i>Proc. Winter Simulation Conference (WSC)</i>, Dec. 2022. </div> </td> </tr> <tr class="document-box" id="b46"> <td valign="top" class="td1"> [46] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Lu X, Wang R, Sun X H. CARE: A concurrency-aware enhanced lightweight cache management framework. In <i>Proc</i>.<i> the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA)</i>, Feb. 25–Mar. 1, 2023. </div> </td> </tr> <tr class="document-box" id="b47"> <td valign="top" class="td1"> [47] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Yan L, Zhang M Z, Wang R J, Chen X M, Zou X Q, Lu X Y, Han Y H, Sun X H. CoPIM: A concurrency-aware PIM workload offloading architecture for graph applications. In <i>Proc. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)</i>, Jul. 2021. DOI: <a href="https://doi.org/10.1109/ISLPED52811.2021.9502483">10.1109/ISLPED52811.2021.9502483</a>. </div> </td> </tr> <tr class="document-box" id="b48"> <td valign="top" class="td1"> [48] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Zhang N, Jiang C T, Sun X H, Song S L. Evaluating GPGPU memory performance through the C-AMAT model. In <i>Proc</i>.<i> the Workshop on Memory Centric Programming for HPC</i>, Nov. 2017, pp.35-39. DOI: <a href="https://doi.org/10.1145/3145617.3158214">10.1145/3145617.3158214</a>. </div> </td> </tr> <tr class="document-box" id="b49"> <td valign="top" class="td1"> [49] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kannan S, Gavrilovska A, Schwan K, Milojicic D, Talwar V. Using active NVRAM for I/O staging. In <i>Proc</i>.<i> the 2nd International Workshop on Petascal Data Analytics</i>:<i> Challenges and Opportunities</i>, Nov. 2011, pp.15-22. DOI: <a href="https://doi.org/10.1145/2110205.2110209">10.1145/2110205.2110209</a>. </div> </td> </tr> <tr class="document-box" id="b50"> <td valign="top" class="td1"> [50] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Caulfield A M, Grupp L M, Swanson S. Gordon: Using flash memory to build fast, power-efficient clusters for data-intensive applications. <i>ACM SIGPLAN Notices</i>, 2009, 44(3): 217-228. DOI: <a href="https://doi.org/10.1145/1508284.1508270">10.1145/1508284.1508270</a>. </div> </td> </tr> <tr class="document-box" id="b51"> <td valign="top" class="td1"> [51] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Reed D A, Dongarra J. Exascale computing and big data. <i>Communications of the ACM</i>, 2015, 58(7): 56-68. DOI: <a href="https://doi.org/10.1145/2699414">10.1145/2699414</a>. </div> </td> </tr> <tr class="document-box" id="b52"> <td valign="top" class="td1"> [52] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In<i> Proc</i>.<i> the 9th International Conference on High Performance Computing for Computational Science</i>, Jun. 2010. DOI: <a href="https://doi.org/10.1007/978-3-642-19328-6_1">10.1007/978-3-642-19328-6_1</a>. </div> </td> </tr> <tr class="document-box" id="b53"> <td valign="top" class="td1"> [53] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kougkas A, Devarajan H, Sun X H. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In <i>Proc</i>.<i> the 27th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2018, pp.219-230. DOI: <a href="https://doi.org/10.1145/3208040.3208059">10.1145/3208040.3208059</a>. </div> </td> </tr> <tr class="document-box" id="b54"> <td valign="top" class="td1"> [54] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kougkas A, Devarajan H, Sun X H. I/O acceleration via multi-tiered data buffering and prefetching. <i>Journal of Computer Science and Technology</i>, 2020, 35(1): 92-120. DOI: <a href="https://doi.org/10.1007/s11390-020-9781-1">10.1007/s11390-020-9781-1</a>. </div> </td> </tr> <tr class="document-box" id="b55"> <td valign="top" class="td1"> [55] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Tissenbaum M, Sheldon J, Abelson H. From computational thinking to computational action. <i>Communications of the ACM</i>, 2019, 62(3): 34-36. DOI: <a href="https://doi.org/10.1145/3265747">10.1145/3265747</a>. </div> </td> </tr> <tr class="document-box" id="b56"> <td valign="top" class="td1"> [56] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Liu Y H, Sun X H, Wang Y, Bao Y G. HCDA: From computational thinking to a generalized thinking paradigm. <i>Communications of the ACM</i>, 2021, 64(5): 66-75. DOI: <a href="https://doi.org/10.1145/3418291">10.1145/3418291</a>. </div> </td> </tr> <tr class="document-box" id="b57"> <td valign="top" class="td1"> [57] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Owens J D, Houston M, Luebke D, Green S, Stone J E, Phillips J C. GPU computing. <i>Proceedings of the IEEE</i>, 2008, 96(5): 879-899. DOI: <a href="https://doi.org/10.1109/JPROC.2008.917757">10.1109/JPROC.2008.917757</a>. </div> </td> </tr> <tr class="document-box" id="b58"> <td valign="top" class="td1"> [58] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. <i>Communications of the ACM</i>, 2008, 51(1): 107-113. DOI: <a href="https://doi.org/10.1145/1327452.1327492">10.1145/1327452.1327492</a>. </div> </td> </tr> <tr class="document-box" id="b59"> <td valign="top" class="td1"> [59] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Momose H, Kaneko T, Asai T. Systems and circuits for AI chips and their trends. <i>Japanese Journal of Applied Physics</i>, 2020, 59(5): 050502. DOI: <a href="https://doi.org/10.35848/1347-4065/ab839f">10.35848/1347-4065/ab839f</a>. </div> </td> </tr> <tr class="document-box" id="b60"> <td valign="top" class="td1"> [60] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Singh G, Alser M, Cali D S, Diamantopoulos D, Gómez-Luna J, Corporaal H, Mutlu O. FPGA-based near-memory acceleration of modern data-intensive applications. <i>IEEE Micro</i>, 2021, 41(4): 39-48. DOI: <a href="https://doi.org/10.1109/MM.2021.3088396">10.1109/MM.2021.3088396</a>. </div> </td> </tr> <tr class="document-box" id="b61"> <td valign="top" class="td1"> [61] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Choi Y K, Santillana C, Shen Y J, Darwiche A, Cong J. FPGA acceleration of probabilistic sentential decision diagrams with high-level synthesis. <i>ACM Trans</i>.<i> Reconfigurable Technology and Systems</i>, 2022. DOI: <a href="https://doi.org/10.1145/3561514">10.1145/3561514</a>. </div> </td> </tr> <tr class="document-box" id="b62"> <td valign="top" class="td1"> [62] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ghose S, Boroumand A, Kim J S, Gómez-Luna J, Mutlu O. Processing-in-memory: A workload-driven perspective. <i>IBM Journal of Research and Development</i>, 2019, 63(6): Article No. 3. DOI: <a href="https://doi.org/10.1147/JRD.2019.2934048">10.1147/JRD.2019.2934048</a>. </div> </td> </tr> <tr class="document-box" id="b63"> <td valign="top" class="td1"> [63] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Ghiasi N M, Park J, Mustafa H, Kim J, Olgun A, Gollwitzer A, Cali D S, Firtina C, Mao H Y, Alserr N A, Ausavarungnirun R, Vijaykumar N, Alser M, Mutlu O. GenStore: A high-performance in-storage processing system for genome sequence analysis. In <i>Proc</i>.<i> the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems</i>, Feb. 2022, pp.635-654. DOI: <a href="https://doi.org/10.1145/3503222.3507702">10.1145/3503222.3507702</a>. </div> </td> </tr> <tr class="document-box" id="b64"> <td valign="top" class="td1"> [64] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Mutlu O. Intelligent architectures for intelligent computing systems. In<i> Proc. the 2021 Design</i>,<i> Automation & Test in Europe Conference & Exhibition (DATE)</i>, Feb. 2021, pp.318-323. DOI: <a href="https://doi.org/10.23919/DATE51398.2021.9474073">10.23919/DATE51398.2021.9474073</a>. </div> </td> </tr> <tr class="document-box" id="b65"> <td valign="top" class="td1"> [65] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Sun X H, Liu Y H. Utilizing concurrency: A new theory for memory wall. In<i> Proc</i>.<i> the 29th International Workshop on Languages and Compilers for Parallel Computing</i>, Sept. 2016, pp.18-23. DOI: <a href="https://doi.org/10.1007/978-3-319-52709-3_2">10.1007/978-3-319-52709-3_2</a>. </div> </td> </tr> <tr class="document-box" id="b66"> <td valign="top" class="td1"> [66] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS: A distributed label-based I/O system. In <i>Proc</i>.<i> the 28th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2019, pp.13-24. DOI: <a href="https://doi.org/10.1145/3307681.3325405">10.1145/3307681.3325405</a>. </div> </td> </tr> <tr class="document-box" id="b67"> <td valign="top" class="td1"> [67] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Logan L, Garcia J C, Lofstead J, Sun X H, Kougkas A. LabStor: A modular and extensible platform for developing high-performance, customized I/O stacks in userspace. In <i>Proc</i>.<i> the ACM/IEEE International Conference for High Performance Computing</i>,<i> Networking</i>,<i> Storage and Analysis (SC’22)</i>, Nov. 2022, pp.309-323. </div> </td> </tr> <tr class="document-box" id="b68"> <td valign="top" class="td1"> [68] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hwang K, Xu Z W. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1998. </div> </td> </tr> <tr class="document-box" id="b69"> <td valign="top" class="td1"> [69] </td> <td class="td2"> <div class="reference-en" style="margin:0px;padding:0px;"> Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993. </div> </td> </tr> </tbody> </table>
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Li Wei;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[3] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[4] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[5] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[6] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[7] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[8] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[9] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[10] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved