<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;">
<tbody>
<tr class="document-box" id="b1">
<td valign="top" class="td1">
[1]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. <i>ACM SIGARCH Computer Architecture News</i>, 1995, 23(1): 20-24. DOI: <a href="https://doi.org/10.1145/216585.216588">10.1145/216585.216588</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b2">
<td valign="top" class="td1">
[2]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Ni L M. Scalable problems and memory-bounded speedup. <i>Journal of Parallel and Distributed Computing</i>, 1993, 19(1): 27-37. DOI: <a href="https://doi.org/10.1006/jpdc.1993.1087">10.1006/jpdc.1993.1087</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b3">
<td valign="top" class="td1">
[3]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Ni L M. Another view on parallel speedup. In <i>Proc</i>.<i> the 1990 ACM/IEEE Conference on Supercomputing</i>, Nov. 1990, pp.324-333. DOI: <a href="https://doi.org/10.1109/SUPERC.1990.130037">10.1109/SUPERC.1990.130037</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b4">
<td valign="top" class="td1">
[4]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In <i>Proc</i>.<i> the </i><i>Spring Joint Computer Conference</i>, Apr. 1967, pp.483-485. DOI: <a href="https://doi.org/10.1145/1465482.1465560">10.1145/1465482.1465560</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b5">
<td valign="top" class="td1">
[5]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Gustafson J L. Reevaluating Amdahl’s law. <i>Communications of the ACM</i>, 1988, 31(5): 532-533. DOI: <a href="https://doi.org/10.1145/42411.42415">10.1145/42411.42415</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b6">
<td valign="top" class="td1">
[6]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Bashe C J, Johnson L R, Palmer J H, Pugh E W. IBM’s Early Computers. MIT Press, 1986.
</div>
</td>
</tr>
<tr class="document-box" id="b7">
<td valign="top" class="td1">
[7]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Chen Y. Reevaluating Amdahl’s law in the multicore era. <i>Journal of Parallel and Distributed Computing</i>, 2010, 70(2): 183-188. DOI: <a href="https://doi.org/10.1016/j.jpdc.2009.05.002">10.1016/j.jpdc.2009.05.002</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b8">
<td valign="top" class="td1">
[8]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Pan C Y, Naeemi A. System-level optimization and benchmarking of graphene PN junction logic system based on empirical CPI model. In <i>Proc. the IEEE International Conference on IC Design & Technology</i>, Jun. 2012. DOI: 10.<a href="https://doi.org/1109/ICICDT.2012.6232850">1109/ICICDT.2012.6232850</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b9">
<td valign="top" class="td1">
[9]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kogge P M. Hardware Evolution Trends of Extreme Scale Computing. Technical Reprt, University of Notre Dame, South Bend, 2011.
</div>
</td>
</tr>
<tr class="document-box" id="b10">
<td valign="top" class="td1">
[10]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (6th edition). Elsevier, 2017.
</div>
</td>
</tr>
<tr class="document-box" id="b11">
<td valign="top" class="td1">
[11]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Liu Y H, Sun X H. LPM: A systematic methodology for concurrent data access pattern optimization from a matching perspective. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 2019, 30(11): 2478-2493. DOI: <a href="https://doi.org/10.1109/TPDS.2019.2912573">10.1109/TPDS.2019.2912573</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b12">
<td valign="top" class="td1">
[12]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lo Y J, Williams S, Straalen B V, Ligocki T J, Cordery M J, Wright N J, Hall M W, Oliker L. Roofline model toolkit: A practical tool for architectural and program analysis. In <i>Proc</i>.<i> the 5th International Workshop on Performance Modeling</i>,<i> Benchmarking and Simulation of High Performance Computer Systems</i>, Nov. 2014, pp.129-148. DOI: <a href="https://doi.org/10.1007/978-3-319-17248-4_7">10.1007/978-3-319-17248-4_7</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b13">
<td valign="top" class="td1">
[13]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Saini S, Chang J, Jin H Q. Performance evaluation of the Intel sandy bridge based NASA Pleiades using scientific and engineering applications. In <i>Proc</i>.<i> the 4th International Workshop on Performance Modeling</i>,<i> Benchmarking and Simulation of High Performance Computer Systems</i>, Nov. 2013, pp.25-51. DOI: <a href="https://doi.org/10.1007/978-3-319-10214-6_2">10.1007/978-3-319-10214-6_2</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b14">
<td valign="top" class="td1">
[14]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Gustafson J L. Toward a better parallel performance metric. <i>Parallel Computing</i>, 1991, 17(10/11): 1093-1109. DOI: <a href="https://doi.org/10.1016/S0167-8191(05)80028-6">10.1016/S0167-8191(05)80028-6</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b15">
<td valign="top" class="td1">
[15]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kumar V, Singh V. Scalability of parallel algorithms for the all-pairs shortest-path problem. <i>Journal of Parallel and Distributed Computing</i>, 1991, 13(2): 124-138. DOI: <a href="https://doi.org/10.1016/0743-7315(91)90083-L">10.1016/0743-7315(91)90083-L</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b16">
<td valign="top" class="td1">
[16]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kumar V, Grama A, Gupta A, Karypis G. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994.
</div>
</td>
</tr>
<tr class="document-box" id="b17">
<td valign="top" class="td1">
[17]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Chen Y, Wu M. Scalability of heterogeneous computing. In <i>Proc. the International Conference on Parallel Processing (ICPP’05)</i>, Jun. 2005, pp.557-564. DOI: <a href="https://doi.org/10.1109/ICPP.2005.69">10.1109/ICPP.2005.69</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b18">
<td valign="top" class="td1">
[18]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Rover D T. Scalability of parallel algorithm-machine combinations. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 1994, 5(6): 599-613. DOI: <a href="https://doi.org/10.1109/71.285606">10.1109/71.285606</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b19">
<td valign="top" class="td1">
[19]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Pantano M, Fahringer T. Integrated range comparison for data-parallel compilation systems. <i>IEEE Trans</i>.<i> Parallel and Distributed Systems</i>, 1999, 10(5): 448-458. DOI: <a href="https://doi.org/10.1109/71.770134">10.1109/71.770134</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b20">
<td valign="top" class="td1">
[20]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H. Scalability versus ution time in scalable systems. <i>Journal of Parallel and Distributed Computing</i>, 2002, 62(2): 173-192. DOI: <a href="https://doi.org/10.1006/jpdc.2001.1773">10.1006/jpdc.2001.1773</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b21">
<td valign="top" class="td1">
[21]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Hill M D, Marty M R. Amdahl’s law in the multicore era. <i>Computer</i>, 2008, 41(7): 33-38. DOI: <a href="https://doi.org/10.1109/MC.2008.209">10.1109/MC.2008.209</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b22">
<td valign="top" class="td1">
[22]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Chen Y, Byna S. Scalable computing in the multicore era. In <i>Proc</i>.<i> the 2008 International Symposium on Parallel Architectures</i>,<i> Algorithms and Programming</i>, Sept. 2008.
</div>
</td>
</tr>
<tr class="document-box" id="b23">
<td valign="top" class="td1">
[23]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Dwork C, Goldberg A, Naor M. On memory-bound functions for fighting spam. In <i>Proc</i>.<i> the 23rd Annual International Cryptology Conference</i>, Aug. 2003, pp.426-444. DOI: <a href="https://doi.org/10.1007/978-3-540-45146-4_25">10.1007/978-3-540-45146-4_25</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b24">
<td valign="top" class="td1">
[24]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Abadi M, Burrows M, Manasse M, Wobber T. Moderately hard, memory-bound functions. <i>ACM Trans</i>.<i> Internet Technology</i>, 2005, 5(2): 299-327. DOI: <a href="https://doi.org/10.1145/1064340.1064341">10.1145/1064340.1064341</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b25">
<td valign="top" class="td1">
[25]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Hart P E, Nilsson N J, Raphael B. A formal basis for the heuristic determination of minimum cost paths. <i>IEEE Trans</i>.<i> Systems Science and Cybernetics</i>, 1968, 4(2): 100-107. DOI: <a href="https://doi.org/10.1109/TSSC.1968.300136">10.1109/TSSC.1968.300136</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b26">
<td valign="top" class="td1">
[26]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Korf R E. Depth-first iterative-deepening: An optimal admissible tree search. <i>Artificial Intelligence</i>, 1985, 27(1): 97-109. DOI: <a href="https://doi.org/10.1016/0004-3702(85)90084-0">10.1016/0004-3702(85)90084-0</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b27">
<td valign="top" class="td1">
[27]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Korf R E, Reid M, Edelkamp S. Time complexity of iterative-deepening-A*. <i>Artificial Intelligence</i>, 2001, 129(1/2): 199-218. DOI: <a href="https://doi.org/10.1016/S0004-3702(01)00094-7">10.1016/S0004-3702(01)00094-7</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b28">
<td valign="top" class="td1">
[28]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Russell S. Efficient memory-bounded search methods. In<i> Proc</i>.<i> the 10th European Conference on Artificial intelligence</i>, Aug. 1992.
</div>
</td>
</tr>
<tr class="document-box" id="b29">
<td valign="top" class="td1">
[29]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lovinger J, Zhang X Q. Enhanced simplified memory-bounded a star (SMA*+). In<i> Proc</i>.<i> the 3rd Global Conference on Artificial Intelligence</i>, Oct. 2017, pp.202-212. DOI: <a href="https://doi.org/10.29007/v7zc">10.29007/v7zc</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b30">
<td valign="top" class="td1">
[30]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Seuken S, Zilberstein S. Memory-bounded dynamic programming for DEC-POMDPs. In <i>Proc</i>.<i> the 20th International Joint Conference on Artifical Intelligence</i>, Jan. 2007, pp.2009-2015.
</div>
</td>
</tr>
<tr class="document-box" id="b31">
<td valign="top" class="td1">
[31]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Seuken S, Zilberstein S. Improved memory-bounded dynamic programming for decentralized pomdps. arXiv: 1206.5295, 2012. <a href="https://arxiv.org/abs/1206.5295,20Dec.202022">https://arxiv.org/abs/1206.5295, Dec. 2022</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b32">
<td valign="top" class="td1">
[32]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Chen Z Y, Zhang W X, Deng Y C, Chen D D, Li Q. RMB-DPOP: Refining MB-DPOP by reducing redundant inferences. arXiv: 2002.10641, 2020. <a href="https://doi.org/10.48550/arXiv.2002">https://doi.org/10.48550/arXiv.2002</a>.10641, Dec. 2022.
</div>
</td>
</tr>
<tr class="document-box" id="b33">
<td valign="top" class="td1">
[33]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Brito I, Meseguer P. Improving DPOP with function filtering. In <i>Proc</i>.<i> the 9th</i> <i>International Conference on Autonomous Agents and Multiagent Systems</i>:<i> Volume 1</i>, May 2010, pp.141-148.
</div>
</td>
</tr>
<tr class="document-box" id="b34">
<td valign="top" class="td1">
[34]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Petcu A, Faltings B. ODPOP: An algorithm for open/distributed constraint optimization. In <i>Proc</i>.<i> the 21st National Conference on Artificial Intelligence</i>, Jul. 2006, pp.703-708.
</div>
</td>
</tr>
<tr class="document-box" id="b35">
<td valign="top" class="td1">
[35]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Petcu A, Faltings B. A hybrid of inference and local search for distributed combinatorial optimization. In <i>Proc. the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’07)</i>, Nov. 2007, pp.342-348. DOI: <a href="https://doi.org/10.1109/IAT.2007.12">10.1109/IAT.2007.12</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b36">
<td valign="top" class="td1">
[36]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Petcu A, Faltings B. MB-DPOP: A new memory-bounded algorithm for distributed optimization. In <i>Proc</i>.<i> the 20th International Joint Conference on Artifical Intelligence</i>, Jan. 2007, pp.1452-1457.
</div>
</td>
</tr>
<tr class="document-box" id="b37">
<td valign="top" class="td1">
[37]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Williams S W. Auto-tuning performance on multicore computers [Ph.D. Thesis]. University of California, Berkeley, 2008.
</div>
</td>
</tr>
<tr class="document-box" id="b38">
<td valign="top" class="td1">
[38]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. <i>Communications of the ACM</i>, 2009, 52(4): 65-76. DOI: 10.<a href="https://doi.org/1145/1498765.1498785">1145/1498765.1498785</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b39">
<td valign="top" class="td1">
[39]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lu X Y, Wang R J, Sun X H. APAC: An accurate and adaptive prefetch framework with concurrent memory access analysis. In <i>Proc. the 38th IEEE International Conference on Computer Design (ICCD)</i>, Oct. 2020, pp.222-229. DOI: <a href="https://doi.org/10.1109/ICCD50377.2020.00048">10.1109/ICCD50377.2020.00048</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b40">
<td valign="top" class="td1">
[40]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lu X Y, Wang R J, Sun X H. Premier: A concurrency-aware pseudo-partitioning framework for shared last-level cache. In <i>Proc. the 39th IEEE International Conference on Computer Design (ICCD)</i>, Oct. 2021, pp.391-394. DOI: <a href="https://doi.org/10.1109/ICCD53106.2021.00068">10.1109/ICCD53106.2021.00068</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b41">
<td valign="top" class="td1">
[41]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Liu J, Espina P, Sun X H. A study on modeling and optimization of memory systems. <i>Journal of Computer Science and Technology</i>, 2021, 36(1): 71-89. DOI: <a href="https://doi.org/10.1007/s11390-021-0771-8">10.1007/s11390-021-0771-8</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b42">
<td valign="top" class="td1">
[42]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Glew A. MLP yes! ILP no. In <i>Proc. ASPLOS Wild and Crazy Idea Session</i>, Oct. 1998.
</div>
</td>
</tr>
<tr class="document-box" id="b43">
<td valign="top" class="td1">
[43]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In <i>Proc. the </i><i>33rd International Symposium on Computer Architecture (ISCA’06)</i>, Jun. 2006, pp.167-178. DOI: <a href="https://doi.org/10.1109/ISCA.2006.5">10.1109/ISCA.2006.5</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b44">
<td valign="top" class="td1">
[44]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Wang D W. Concurrent average memory access time. <i>Computer</i>, 2014, 47(5): 74-80. DOI: <a href="https://doi.org/10.1109/MC.2013.227">10.1109/MC.2013.227</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b45">
<td valign="top" class="td1">
[45]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Najafi H, Lu X, Liu J, Sun X H. A generalized model for modern hierarchical memory system. In <i>Proc. Winter Simulation Conference (WSC)</i>, Dec. 2022.
</div>
</td>
</tr>
<tr class="document-box" id="b46">
<td valign="top" class="td1">
[46]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lu X, Wang R, Sun X H. CARE: A concurrency-aware enhanced lightweight cache management framework. In <i>Proc</i>.<i> the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA)</i>, Feb. 25–Mar. 1, 2023.
</div>
</td>
</tr>
<tr class="document-box" id="b47">
<td valign="top" class="td1">
[47]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Yan L, Zhang M Z, Wang R J, Chen X M, Zou X Q, Lu X Y, Han Y H, Sun X H. CoPIM: A concurrency-aware PIM workload offloading architecture for graph applications. In <i>Proc. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)</i>, Jul. 2021. DOI: <a href="https://doi.org/10.1109/ISLPED52811.2021.9502483">10.1109/ISLPED52811.2021.9502483</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b48">
<td valign="top" class="td1">
[48]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Zhang N, Jiang C T, Sun X H, Song S L. Evaluating GPGPU memory performance through the C-AMAT model. In <i>Proc</i>.<i> the Workshop on Memory Centric Programming for HPC</i>, Nov. 2017, pp.35-39. DOI: <a href="https://doi.org/10.1145/3145617.3158214">10.1145/3145617.3158214</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b49">
<td valign="top" class="td1">
[49]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kannan S, Gavrilovska A, Schwan K, Milojicic D, Talwar V. Using active NVRAM for I/O staging. In <i>Proc</i>.<i> the 2nd International Workshop on Petascal Data Analytics</i>:<i> Challenges and Opportunities</i>, Nov. 2011, pp.15-22. DOI: <a href="https://doi.org/10.1145/2110205.2110209">10.1145/2110205.2110209</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b50">
<td valign="top" class="td1">
[50]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Caulfield A M, Grupp L M, Swanson S. Gordon: Using flash memory to build fast, power-efficient clusters for data-intensive applications. <i>ACM SIGPLAN Notices</i>, 2009, 44(3): 217-228. DOI: <a href="https://doi.org/10.1145/1508284.1508270">10.1145/1508284.1508270</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b51">
<td valign="top" class="td1">
[51]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Reed D A, Dongarra J. Exascale computing and big data. <i>Communications of the ACM</i>, 2015, 58(7): 56-68. DOI: <a href="https://doi.org/10.1145/2699414">10.1145/2699414</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b52">
<td valign="top" class="td1">
[52]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In<i> Proc</i>.<i> the 9th International Conference on High Performance Computing for Computational Science</i>, Jun. 2010. DOI: <a href="https://doi.org/10.1007/978-3-642-19328-6_1">10.1007/978-3-642-19328-6_1</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b53">
<td valign="top" class="td1">
[53]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kougkas A, Devarajan H, Sun X H. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In <i>Proc</i>.<i> the 27th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2018, pp.219-230. DOI: <a href="https://doi.org/10.1145/3208040.3208059">10.1145/3208040.3208059</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b54">
<td valign="top" class="td1">
[54]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kougkas A, Devarajan H, Sun X H. I/O acceleration via multi-tiered data buffering and prefetching. <i>Journal of Computer Science and Technology</i>, 2020, 35(1): 92-120. DOI: <a href="https://doi.org/10.1007/s11390-020-9781-1">10.1007/s11390-020-9781-1</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b55">
<td valign="top" class="td1">
[55]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Tissenbaum M, Sheldon J, Abelson H. From computational thinking to computational action. <i>Communications of the ACM</i>, 2019, 62(3): 34-36. DOI: <a href="https://doi.org/10.1145/3265747">10.1145/3265747</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b56">
<td valign="top" class="td1">
[56]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Liu Y H, Sun X H, Wang Y, Bao Y G. HCDA: From computational thinking to a generalized thinking paradigm. <i>Communications of the ACM</i>, 2021, 64(5): 66-75. DOI: <a href="https://doi.org/10.1145/3418291">10.1145/3418291</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b57">
<td valign="top" class="td1">
[57]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Owens J D, Houston M, Luebke D, Green S, Stone J E, Phillips J C. GPU computing. <i>Proceedings of the IEEE</i>, 2008, 96(5): 879-899. DOI: <a href="https://doi.org/10.1109/JPROC.2008.917757">10.1109/JPROC.2008.917757</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b58">
<td valign="top" class="td1">
[58]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. <i>Communications of the ACM</i>, 2008, 51(1): 107-113. DOI: <a href="https://doi.org/10.1145/1327452.1327492">10.1145/1327452.1327492</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b59">
<td valign="top" class="td1">
[59]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Momose H, Kaneko T, Asai T. Systems and circuits for AI chips and their trends. <i>Japanese Journal of Applied Physics</i>, 2020, 59(5): 050502. DOI: <a href="https://doi.org/10.35848/1347-4065/ab839f">10.35848/1347-4065/ab839f</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b60">
<td valign="top" class="td1">
[60]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Singh G, Alser M, Cali D S, Diamantopoulos D, Gómez-Luna J, Corporaal H, Mutlu O. FPGA-based near-memory acceleration of modern data-intensive applications. <i>IEEE Micro</i>, 2021, 41(4): 39-48. DOI: <a href="https://doi.org/10.1109/MM.2021.3088396">10.1109/MM.2021.3088396</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b61">
<td valign="top" class="td1">
[61]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Choi Y K, Santillana C, Shen Y J, Darwiche A, Cong J. FPGA acceleration of probabilistic sentential decision diagrams with high-level synthesis. <i>ACM Trans</i>.<i> Reconfigurable Technology and Systems</i>, 2022. DOI: <a href="https://doi.org/10.1145/3561514">10.1145/3561514</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b62">
<td valign="top" class="td1">
[62]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Ghose S, Boroumand A, Kim J S, Gómez-Luna J, Mutlu O. Processing-in-memory: A workload-driven perspective. <i>IBM Journal of Research and Development</i>, 2019, 63(6): Article No. 3. DOI: <a href="https://doi.org/10.1147/JRD.2019.2934048">10.1147/JRD.2019.2934048</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b63">
<td valign="top" class="td1">
[63]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Ghiasi N M, Park J, Mustafa H, Kim J, Olgun A, Gollwitzer A, Cali D S, Firtina C, Mao H Y, Alserr N A, Ausavarungnirun R, Vijaykumar N, Alser M, Mutlu O. GenStore: A high-performance in-storage processing system for genome sequence analysis. In <i>Proc</i>.<i> the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems</i>, Feb. 2022, pp.635-654. DOI: <a href="https://doi.org/10.1145/3503222.3507702">10.1145/3503222.3507702</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b64">
<td valign="top" class="td1">
[64]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Mutlu O. Intelligent architectures for intelligent computing systems. In<i> Proc. the 2021 Design</i>,<i> Automation & Test in Europe Conference & Exhibition (DATE)</i>, Feb. 2021, pp.318-323. DOI: <a href="https://doi.org/10.23919/DATE51398.2021.9474073">10.23919/DATE51398.2021.9474073</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b65">
<td valign="top" class="td1">
[65]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Sun X H, Liu Y H. Utilizing concurrency: A new theory for memory wall. In<i> Proc</i>.<i> the 29th International Workshop on Languages and Compilers for Parallel Computing</i>, Sept. 2016, pp.18-23. DOI: <a href="https://doi.org/10.1007/978-3-319-52709-3_2">10.1007/978-3-319-52709-3_2</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b66">
<td valign="top" class="td1">
[66]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS: A distributed label-based I/O system. In <i>Proc</i>.<i> the 28th International Symposium on High-Performance Parallel and Distributed Computing</i>, Jun. 2019, pp.13-24. DOI: <a href="https://doi.org/10.1145/3307681.3325405">10.1145/3307681.3325405</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b67">
<td valign="top" class="td1">
[67]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Logan L, Garcia J C, Lofstead J, Sun X H, Kougkas A. LabStor: A modular and extensible platform for developing high-performance, customized I/O stacks in userspace. In <i>Proc</i>.<i> the ACM/IEEE International Conference for High Performance Computing</i>,<i> Networking</i>,<i> Storage and Analysis (SC’22)</i>, Nov. 2022, pp.309-323.
</div>
</td>
</tr>
<tr class="document-box" id="b68">
<td valign="top" class="td1">
[68]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Hwang K, Xu Z W. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1998.
</div>
</td>
</tr>
<tr class="document-box" id="b69">
<td valign="top" class="td1">
[69]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.
</div>
</td>
</tr>
</tbody>
</table> |