|
计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (1): 123-139.doi: 10.1007/s11390-020-9826-z
所属专题: Computer Architecture and Systems
Jun-Shi Chen1, Member, CCF, Hong An1, Member, CCF, ACM, IEEE Wen-Ting Han1,*, Member, CCF, ACM, IEEE, Zeng Lin1, and Xin Liu2, Member, CCF
Jun-Shi Chen1, Member, CCF, Hong An1, Member, CCF, ACM, IEEE Wen-Ting Han1,*, Member, CCF, ACM, IEEE, Zeng Lin1, and Xin Liu2, Member, CCF
在分子动力学模拟中,粒子间的短程相互作用消耗了大部分的CPU 时间。其固有的计算稀疏性使得在新兴的众核架构上实现高性能的计算核心具有较大的挑战。在本文中,我们针对申威众核处理器架构提出了一种非常高效的粒子短程相互作用计算核心。目前在这款众核处理器上该算法的并行效率受到数据局部性差和写冲突的极大限制。为了增强数据局部性,我们提出了一种适应计算核心片上存储空间的超簇邻居列表。在处理器没有低开销锁机制的情况下,使用数据私有化力数组是避免写冲突的一种可行方法,但会带来较大的数据归约开销。为此我们提出了一种同时对硬件资源和计算任务进行划分的双切片分区方案,利用片上数据通信减少数据归约开销,同时保证负载平衡。此外,我们充分开发该处理器的SIMD并行性,并对作用力计算核心进行了指令重排优化。实验结果表明,在申威众核处理器上优化后的作用力计算核心性能相比之前提高226倍,其浮点计算性能达到理论值的20%。
[1] Hollingsworth S A, Dror R O. Molecular dynamics simulation for all. Neuron, 2018, 99(6):1129-1143. DOI:10.1016/j.neuron.2018.08.011. [2] Kumar S, Huang C, Zheng G et al. Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system. IBM Journal of Research and Development, 2008, 52(1/2):177- 188. DOI:10.1147/rd.521.0177. [3] Shaw D E, Grossman J P, Bank J A et al. Anton 2:Raising the bar for performance and programmability in a specialpurpose molecular dynamics supercomputer. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2014, pp.41-53. DOI:10.1109/SC.2014.9. [4] Shaw D E, Deneroff M M, Dror R O et al. Anton, a special-purpose machine for molecular dynamics simulation. Commun. ACM, 2008, 51(7):91-97. DOI:10.1145/1364782.1364802. [5] Götz A W, Williamson M J, Xu D et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. generalized born. Journal of Chemical Theory and Computation, 2012, 8(5):1542-1555. DOI:10.1021/ct200909j. [6] Pennycook S J, Hughes C J, Smelyanskiy M, Jarvis S A. Exploring SIMD for molecular dynamics, using Intelr Xeonr and Intelr Xeon Phi coprocessors. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.1085-1097. DOI:10.1109/IPDPS.2013.44. [7] Wang H Q, Peng S L, Zhu X Q et al. A method to accelerate GROMACS in offload mode on Tianhe-2 supercomputer. In Proc. the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2015, pp.781-784. DOI:10.1109/CCGrid.2015.65. [8] Hu C J, Wang X M, Li J J et al. Kernel optimization for short-range molecular dynamics. Computer Physics Communications, 2017, 211:31-40. DOI:10.1016/j.cpc.2016.07.010. [9] Law T R, Hancox J, Wright S A, Jarvis S A. An algorithm for computing short-range forces in molecular dynamics simulations with non-uniform particle densities. Journal of Parallel and Distributed Computing, 2019, 130:1-11. DOI:10.1016/j.jpdc.2019.03.008. [10] Peng S L, Cui Y B, Yang S Y et al. A CPU/MIC collaborated parallel framework for GROMACS on Tianhe-2 supercomputer. IEEE/ACM Trans. Comput. Biology Bioinform., 2019, 16(2):425-433. DOI:10.1109/TCBB.2017.2713362. [11] Anderson J A, Lorenz C D, Travesset A. General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics, 2008, 227(10):5342-5359. DOI:10.1016/j.jcp.2008.01.047. [12] Friedrichs M S, Eastman P, Vaidyanathan V et al. Accelerating molecular dynamic simulation on graphics processing units. Journal of Computational Chemistry, 2009, 30(6):864-872. DOI:10.1002/jcc.21209. [13] Minkin A S, Knizhnik A A, Potapkin B V. GPU implementations of some many-body potentials for molecular dynamics simulations. Advances in Engineering Software, 2017, 111:43-51. DOI:10.1016/j.advengsoft.2016.05.013. [14] Spellings M, Marson R L, Anderson J A, Glotzer S C. GPU accelerated Discrete Element Method (DEM) molecular dynamics for conservative, faceted particle simulations. Journal of Computational Physics, 2017, 334:460-467. DOI:10.1016/j.jcp.2017.01.014. [15] Fu H H, Liao J F, Yang J Z et al. The Sunway TaihuLight supercomputer:System and applications. Science China Information Sciences, 2016, 59(7):Article No. 072001. DOI:10.1007/s11432-016-5588-7. [16] Dong W Q, Kang L T, Quan Z et al. Implementing molecular dynamics simulation on Sunway TaihuLight system. In Proc. the 18th IEEE International Conference on High Performance Computing and Communications, December 2016, pp.443-450. DOI:10.1109/HPCC-SmartCityDSS.2016.0070. [17] Dong W Q, Li K L, Kang L T, Quan Z, Li K Q. Implementing molecular dynamics simulation on the Sunway TaihuLight system with heterogeneous many-core processors. Concurrency and Computation:Practice and Experience, 2018, 30(16):Article No. e4468. DOI:10.1002/cpe.4468. [18] Yu Y, An H, Chen J S et al. Pipelining computation and optimization strategies for scaling GROMACS on the Sunway many-core processor. In Proc. the 17th International Conference on Algorithms and Architectures for Parallel Processing, August 2017, pp.18-32. DOI:10.1007/978-3-319-65482- 9_2. [19] Duan X H, Gao P, Zhang T J et al. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In Proc. the International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 12. DOI:10.1109/SC.2018.00015. [20] Páll S, Hess B. A flexible algorithm for calculating pair interactions on SIMD architectures. Computer Physics Communications, 2013, 184(12):2641-2650. DOI:10.1016/j.cpc.2013.06.003. [21] Abraham M J, Murtola T, Schulz R et al. GROMACS:High performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX, 2015, 1/2:19-25. DOI:10.1016/j.softx.2015.06.001. [22] Phillips J C, Braun R, Wang W, Gumbart J et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 2005, 26:1781-1802. DOI:10.1002/jcc.20289. [23] Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 1995, 117:1-19. DOI:10.1006/jcph.1995.1039. [24] Yao Z H, Wang J S, Liu G R, Cheng M. Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method. Computer Physics Communications, 2004, 161(1/2):27-35. DOI:10.1016/j.cpc.2004.04.004. [25] Nguyen T D. GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations. Computer Physics Communications, 2017, 212:113-122. DOI:10.1016/j.cpc.2016.10.020. [26] Jia Z, Maggioni M, Staiger B, Scarpazza D P. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv:1804.06826, 2018. https://arxiv.org/abs/1804.06826, April 2020. [27] Kunaseth M, Richards D F, Glosli J N et al. Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics. The Journal of Supercomputing, 2013, 66(1):406-430. DOI:10.1007/s11227-013-0915-x. [28] Lin J, Xu Z G, Cai L J, Nukada A, Satoshi M. Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations. Parallel Computing, 2018, 77:128-143. DOI:10.1016/j.parco.2018.06.001. |
No related articles found! |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |