<p>
<table class="reference-tab" style="background-color:#FFFFFF;width:914.104px;color:#333333;font-family:Calibri, Arial, 微软雅黑, "font-size:16px;">
<tbody>
<tr class="document-box" id="b1">
<td valign="top" class="td1">
[1]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Dongarra J J, Meuer H W, Strohmaier E. Top500 supercomputer sites. <i>Supercomputer</i>, 1997, 13(1): 89–111.
</div>
</td>
</tr>
<tr class="document-box" id="b2">
<td valign="top" class="td1">
[2]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Vazhkudai S S, de Supinski B R, Bland A S et al. The design, deployment, and evaluation of the CORAL pre-exascale systems. In <i>Proc</i>. <i>the 2018 International Conference for High Performance Computing</i>, <i>Networking</i>, <i>Storage and Analysis</i>, Nov. 2018, pp.661–672. DOI: <a href="https://www.doi.org/10.1109/SC.2018.00055">10.1109/SC.2018.00055</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b3">
<td valign="top" class="td1">
[3]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Fu H H, Liao J F, Yang J Z <i>et al</i>. The Sunway TaihuLight supercomputer: System and applications. <i>Science China Information Sciences</i>, 2016, 59(7): 072001. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1007/s11432-016-5588-7" target="_blank">10.1007/s11432-016-5588-7</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b4">
<td valign="top" class="td1">
[4]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Fu H H, Liao J F, Xue W et al. Refactoring and optimizing the community atmosphere model (CAM) on the Sunway TaihuLight supercomputer. In <i>Proc</i>. <i>the 2016</i> <i>International</i> <i>Conference</i> <i>for</i> <i>High</i> <i>Performance</i> <i>Computing</i>, <i>Networking</i>, <i>Storage</i> <i>and</i> <i>Analysis</i>, Nov. 2016, pp.969–980. DOI: <a href="https://www.doi.org/10.1109/SC.2016.82">10.1109/SC.2016.82</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b5">
<td valign="top" class="td1">
[5]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Neale R B, Gettelman A, Park S et al. Description of the NCAR community atmosphere model (CAM 5.0). No. NCAR/TN-486+STR, 2010. DOI: <a href="https://doi.org/10.5065/wgtk-4g06">10.5065/wgtk-4g06</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b6">
<td valign="top" class="td1">
[6]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Edwards H C, Trott C R, Sunderland D. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. <i>Journal of Parallel and Distributed Computing</i>, 2014, 74(12): 3202–3216. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1016/j.jpdc.2014.07.003" target="_blank">10.1016/j.jpdc.2014.07.003</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b7">
<td valign="top" class="td1">
[7]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Trott C R, Lebrun-Grandié D, Arndt D <i>et al</i>. Kokkos 3: Programming model extensions for the exascale era. <i>IEEE Trans. Parallel and Distributed Systems</i>, 2022, 33(4): 805–817. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1109/TPDS.2021.3097283" target="_blank">10.1109/TPDS.2021.3097283</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b8">
<td valign="top" class="td1">
[8]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Beckingsale D A, Burmark J, Hornung R et al. RAJA: Portable performance for large-scale scientific applications. In<i> Proc</i>. the <i>2019</i> <i>IEEE/ACM</i> <i>International</i> <i>workshop</i> <i>on</i> <i>Performance</i>, <i>Portability</i> <i>and</i> <i>Productivity</i> <i>in</i> <i>HPC</i> (<i>P3HPC</i>), Nov. 2019, pp.71–81. DOI: <a href="https://www.doi.org/10.1109/P3HPC49587.2019.00012">10.1109/P3HPC49587.2019.00012</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b9">
<td valign="top" class="td1">
[9]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Reinders J, Ashbaugh B, Brodman J, Kinsner M, Pennycook J, Tian X M. Data Parallel C++: ing DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer Nature, 2021. DOI: <a href="https://www.doi.org/10.1007/978-1-4842-5574-2">10.1007/978-1-4842-5574-2</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b10">
<td valign="top" class="td1">
[10]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Pennycook S J, Sewall J D, Lee V W. Implications of a metric for performance portability. <i>Future Generation Computer Systems</i>, 2019, 92: 947–958. DOI: <a href="https://doi.org/10.1016/j.future.2017.08.007">10.1016/j.future.2017.08.007</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b11">
<td valign="top" class="td1">
[11]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lin W C, McIntosh-Smith S. Comparing Julia to performance portable parallel programming models for HPC. In <i>Proc</i>. <i>the 2021</i> <i>International</i> <i>Workshop</i> <i>on</i> <i>Performance</i> <i>Modeling</i>, <i>Benchmarking</i> <i>and</i> <i>Simulation</i> <i>of</i> <i>High</i> <i>Performance</i> <i>Computer</i> <i>Systems</i>, Nov. 2021, pp.94–105. DOI: <a href="https://www.doi.org/10.1109/PMBS54543.2021.00016">10.1109/PMBS54543.2021.00016</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b12">
<td valign="top" class="td1">
[12]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Ma Z X, He J A, Qiu J Z et al. BaGuaLu: Targeting brain scale pretrained models with over 37 million cores. In <i>Proc</i>. <i>the</i> <i>27th</i> <i>ACM</i> <i>SIGPLAN</i> <i>Symposium</i> <i>on</i> <i>Principles</i> <i>and</i> <i>Practice</i> <i>of</i> <i>Parallel</i> <i>Programming</i>, Apr. 2022, pp.192–204. DOI: <a href="https://www.doi.org/10.1145/3503221.3508417">10.1145/3503221.3508417</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b13">
<td valign="top" class="td1">
[13]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Zhang Y M, Lu K, Chen W G. Processing extreme-scale graphs on China’s supercomputers. <i>Communications of the ACM</i>, 2021, 64(11): 60–63. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1145/3481614" target="_blank">10.1145/3481614</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b14">
<td valign="top" class="td1">
[14]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Zhang Y, Yang M, Baghdadi R, Kamil S, Shun J. Graphit: A high-performance graph DSL. <i>Proceedings of the ACM on</i> <i>Programming Languages</i>, 2018, 2(OOPSLA): Article No. 121. DOI: <a href="http://doi.org/10.1145/3276491">10.1145/3276491</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b15">
<td valign="top" class="td1">
[15]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In <i>Proc. the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation</i>, June 2013, pp.519–530. DOI: <a href="https://doi.org/10.1145/2499370.2462176">10.1145/2499370.2462176</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b16">
<td valign="top" class="td1">
[16]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Chen T Q, Moreau T, Jiang Z H et al. TVM: An automated end-to-end optimizing compiler for deep learning. In <i>Proc. the 13th USENIX Conference on Operating Systems Design and Implementation</i>, Oct. 2018, pp.579-594.
</div>
</td>
</tr>
<tr class="document-box" id="b17">
<td valign="top" class="td1">
[17]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Ben-Nun T, de Fine Licht J, Ziogas A N, Schneider T, Hoefler T. Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures. In <i>Proc</i>. <i>the</i> <i>2019</i> <i>International</i> <i>Conference</i> <i>for</i> <i>High</i> <i>Performance</i> <i>Computing</i>, <i>Networking</i>, <i>Storage</i> <i>and</i> <i>Analysis</i>, Nov. 2019, Article No. 81. DOI: <a href="https://www.doi.org/10.1145/3295500.3356173">10.1145/3295500.3356173</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b18">
<td valign="top" class="td1">
[18]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Ziogas A N, Ben-Nun T, Fernández G I, Schneider T, Luisier M, Hoefler T. A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations. In <i>Proc</i>. <i>the 2019</i> <i>International</i> <i>Conference</i> <i>for</i> <i>High</i> <i>Performance</i> <i>Computing</i>, <i>Networking</i>, <i>Storage</i> <i>and</i> <i>Analysis</i>, Nov. 2019, Article No. 1. DOI: <a href="https://www.doi.org/10.1145/3295500.3357156">10.1145/3295500.3357156</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b19">
<td valign="top" class="td1">
[19]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. In <i>Proc</i>. <i>the 2004 International</i> <i>Symposium</i> <i>on</i> <i>Code</i> <i>Generation</i> <i>and</i> <i>Optimization</i>, Mar. 2004, pp.75–86. DOI: <a href="https://www.doi.org/10.1109/CGO.2004.1281665">10.1109/CGO.2004.1281665</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b20">
<td valign="top" class="td1">
[20]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O. MLIR: A compiler infrastructure for the end of Moore’s law. arXiv: 2002.11054, 2020.<a href="https://arxiv.org/abs/2002.11054"> https://arxiv.org/abs/2002.11054, Mar. 2020.</a>
</div>
</td>
</tr>
<tr class="document-box" id="b21">
<td valign="top" class="td1">
[21]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Gysi T, Müller C, Zinenko O, Herhut S, Davis E, Wicky T, Fuhrer O, Hoefler T, Grosser T. Domain-specific multi-level IR rewriting for GPU: The open earth compiler for GPU-accelerated climate simulation. <i>ACM Transactions on Architecture and Code Optimization</i>, 2021, 18(4): Article No. 51. DOI: <a class="mainColor ref-doi" href="http://dx.doi.org/10.1145/3469030" target="_blank">10.1145/3469030</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b22">
<td valign="top" class="td1">
[22]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
McCaskey A, Nguyen T. A MLIR dialect for quantum assembly languages. In <i>Proc</i>. <i>the</i> <i>2021</i> <i>IEEE</i> <i>International</i> <i>Conference</i> <i>on</i> <i>Quantum</i> <i>Computing</i> <i>and</i> <i>Engineering</i>, Oct. 2021, pp.255–264. DOI: <a href="https://www.doi.org/10.1109/QCE52317.2021">10.1109/QCE52317.2021.00043</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b23">
<td valign="top" class="td1">
[23]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Yoo A B, Jette M A, Grondona M. SLURM: Simple Linux utility for resource management. In <i>Proc</i>. <i>the 9th International Workshop on</i> <i>Job</i> <i>Scheduling</i> <i>Strategies</i> <i>for</i> <i>Parallel</i> <i>Processing</i>, Jun. 2003, pp.44–60. DOI: <a href="https://www.doi.org/10.1007/10968987_3">10.1007/10968987_3</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b24">
<td valign="top" class="td1">
[24]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Bode B, Halstead D M, Kendall R et al. The portable batch scheduler and the Maui scheduler on Linux clusters. In <i>Proc</i>. <i>the 4th Annual Linux Showcase & Conference</i>, Oct. 2000. DOI: <a href="https://www.doi.org/10.5555/1268379.1268406">10.5555/1268379.1268406</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b25">
<td valign="top" class="td1">
[25]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Vavilapalli V K, Murthy A C, Douglas C et al. Apache Hadoop YARN: Yet another resource negotiator. In <i>Proc</i>. <i>the</i> <i>4th</i> <i>Annual</i> <i>Symposium</i> <i>on</i> <i>Cloud</i> <i>Computing</i>, Oct. 2013, Article No. 5. DOI: <a href="https://www.doi.org/10.1145/2523616.2523633">10.1145/2523616.2523633</a>.
</div>
</td>
</tr>
<tr class="document-box" id="b26">
<td valign="top" class="td1">
[26]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Hindman B, Konwinski A, Zaharia M et al. Mesos: A platform for fine-grained resource sharing in the data center. In <i>Proc</i>. <i>the 8th USENIX Conference on Networked Systems Design and Implementation</i>, Mar. 2011, pp.295–308.
</div>
</td>
</tr>
<tr class="document-box" id="b27">
<td valign="top" class="td1">
[27]
</td>
<td class="td2">
<div class="reference-en" style="margin:0px;padding:0px;">
Tang X C, Wang H J, Ma X S et al. Spread-n-Share: Improving application performance and cluster throughput with resource-aware job placement. In <i>Proc</i>. <i>the</i> <i>International</i> <i>Conference</i> <i>for</i> <i>High</i> <i>Performance</i> <i>Computing</i>, <i>Networking</i>, <i>Storage</i> <i>and</i> <i>Analysis</i>, Nov. 2019, Article No. 12. DOI: <a href="https://www.doi.org/10.1145/3295500.3356152">10.1145/3295500.3356152</a>.
</div>
</td>
</tr>
</tbody>
</table>
</p> |