We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Yan YJ, Li HB, Zhao T et al. 10-million atoms simulation of first-principle package LS3DF. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(1): 45−62 Jan. 2024. DOI: 10.1007/s11390-023-3011-6.
Citation: Yan YJ, Li HB, Zhao T et al. 10-million atoms simulation of first-principle package LS3DF. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(1): 45−62 Jan. 2024. DOI: 10.1007/s11390-023-3011-6.

10-Million Atoms Simulation of First-Principle Package LS3DF

Funds: This work was supported by the National Key Research and Development Program of China under Grant No. 2021YFB0300600, the National Natural Science Foundation of China under Grant Nos. 92270206, T2125013, 62032023, 61972377, T2293702, and 12274360, the Chinese Academy of Sciences Project for Young Scientists in Basic Research under Grant No. YSBR-005, the Network Information Project of Chinese Academy of Sciences under Grant No. CASWX2021SF-0103, and the Key Research Program of Chinese Academy of Sciences under Grant No. ZDBSSSW-WHC002.
More Information
  • Author Bio:

    Yu-Jin Yan is currently a Ph.D. candidate in the State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing. She received her B.S. degree in mathematics and applied mathematics from Sichuan University, Chengdu, in 2017. Her current research interests include high-performance computing, massively parallel computing, and first-principles calculation

    Hai-Bo Li received his Ph.D. degree in computational mathematics from Tsinghua University, Beijing, in 2021. He is currently a joint postdoctoral researcher at the Computing System Optimization Laboratory of Huawei Technologies, Beijing, and Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include numerical linear algebra, computational inverse problems, and machine learning

    Tong Zhao received his Ph.D. degree in operation research and control theory from Fudan University, Shanghai, in 2021. He is currently a postdoctoral researcher at the State Key Laboratory of Processors, the Institute of Computing Technology, Chinese Achademy of Science, Beijing. His research interests include fundamental theory of artificial intelligence, high-performance computing, and game theory

    Lin-Wang Wang received his Ph.D. degree at Cornell University, Ithaca, in 1991. He worked in Cornell University (1991–1992) and the National Renewable Energy Lab (1992–1995) as a postdoctor and then in Biosym/Molecular Simulations Inc. (1995–1996) and the National Renewable Energy Laboratory (1996–1999) as a staff scientist. From 1999, he has worked in Lawrence Berkeley National Laboratory and is a senior staff scientist. He is currently a chief scientist at the Institute of Semiconductors, Chinese Academy of Sciences, Beijing. His research interests mainly focus on the development of ab initio electronic structure calculation methods and the applications of these methods in materials design and discovery

    Lin Shi received his B.E. degree in physics from Southeast University, Nanjing, in 2002, and his Ph.D. degree in condensed matter physics from Tsinghua University, Beijing, in 2007. He is currently teaching at the School of Materials Science and Engineering, Yancheng Institute of Technology, Yancheng. His research interests include first-principles calculation and III-V semiconductors

    Tao Liu received his B.E. degree in computer science from Harbin Engineering University, Harbin, in 2002. He is currently a senior engineer at the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include HPC, machine learning, and AI for science applications

    Guang-Ming Tan received his Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in March 2008. He is currently a researcher at the Institute of Computing, Chinese Academy of Sciences, Beijing. His research interests include parallel algorithm design and analysis, parallel programming and optimization, computer architecture, bioinformatics, and big data

    Wei-Le Jia received his joint Ph.D. degree in the Computer Network Information Center, Chinese Academy of Sciences, Beijing, and Lawrence Berkeley National Laboratory, Berkeley, in 2016. He is currently an associate research fellow at the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include high-performance computing, artificial intelligence, and massively parallel computing

    Ning-Hui Sun received his Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in July 1999. He is currently the Academic Director of the Institute of Computing, Chinese Academy of Sciences, Beijing. His research interests include parallel processing architecture, distributed operating systems, performance evaluation, and file systems

  • Corresponding author:

    Hai-Bo Li is responsible for algorithm design and participated in paper writing; Wei-Le Jia is responsible for the overall design and guidance of the paper work, and algorithmic optimization; Ning-Hui Sun is the chief instructor of the work and responsible for system optimization.

    jiaweile@ict.ac.cn

    snh@ict.ac.cn

  • Received Date: February 20, 2023
  • Accepted Date: April 24, 2023
  • The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with accelerators. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.

  • [1]
    Naveh Y, Likharev K K. Shrinking limits of silicon MOSFETs: Numerical study of 10 nm scale devices. Superlattices and Microstructures, 2000, 27(2/3): 111–123. DOI: 10. 1006/spmi.1999.0807.
    [2]
    Ravaioli U. Quantum phenomena in semiconductor nanostructures. In Encyclopedia of Complexity and Systems Science, Meyers R A (ed.), Springer, 2009, pp.7400–7422. DOI: 10.1007/978-0-387-30440-3_439.
    [3]
    Kohn W, Sham L J. Self-consistent equations including exchange and correlation effects. Physical Review, 1965, 140(4A): A1133–A1138. DOI: 10.1103/PhysRev.140.A1133.
    [4]
    Payne M C, Teter M P, Allan D C, Arias T A, Joannopoulos J D. Iterative minimization techniques for ab initio total-energy calculations: Molecular dynamics and conjugate gradients. Reviews of Modern Physics, 1992, 64(4): 1045–1097. DOI: 10.1103/RevMod-Phys.64.1045.
    [5]
    Kresse G, Furthmüller J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Physical Review B, 1996, 54(16): 11169–11186. DOI: 10.1103/PhysRevB.54.11169.
    [6]
    Tsuchida E, Tsukada M. Electronic-structure calculations based on the finite-element method. Physical Review B, 1995, 52(8): 5573–5578. DOI: 10.1103/PhysRevB.52.5573.
    [7]
    Suryanarayana P, Gavini V, Blesgen T, Bhattacharya K, Ortiz M. Non-periodic finite-element formulation of Kohn–Sham density functional theory. Journal of the Mechanics and Physics of Solids, 2010, 58(2): 256–280. DOI: 10.1016/j.jmps.2009.10.002.
    [8]
    Bao G, Hu G H, Liu D. An h-adaptive finite element solver for the calculations of the electronic structures. Journal of Computational Physics, 2012, 231(14): 4967–4979. DOI: 10.1016/j.jcp.2012.04.002.
    [9]
    Chen H J, Dai X Y, Gong X G, He L H, Zhou A H. Adaptive finite element approximations for Kohn–Sham models. Multiscale Modeling & Simulation, 2014, 12(4): 1828–1869. DOI: 10.1137/130916096.
    [10]
    Das S, Motamarri P, Gavini V, Turcksin B, Li Y W, Leback B. Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system. In Proc. the 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2019, Article No. 2. DOI: 10.1145/3295500.3357157.
    [11]
    Gygi F, Draeger E W, Schulz M, de Supinski B R, Gunnels J A, Austel V, Sexton J C, Franchetti F, Kral S, Ueberhuber C W, Lorenz J. Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform. In Proc. the 2006 ACM/IEEE Conference on Supercomputing, Nov. 2006, Article No. 45. DOI: 10.1145/1188455.1188502.
    [12]
    Wang L W, Lee B, Shan H Z, Zhao Z J, Meza J, Strohmaier E, Bailey D H. Linearly scaling 3D fragment method for large-scale electronic structure calculations. In Proc. the 2008 ACM/IEEE Conference on Supercomputing, Nov. 2008. DOI: 10.1109/SC.2008.5218327.
    [13]
    Hasegawa Y, Iwata J, Tsuji M, Takahashi D, Oshiyama A, Minami K, Boku T, Shoji F, Uno A, Kurokawa M, Inoue H, Miyoshi I, Yokokawa M. First-principles calculations of electron states of a silicon nanowire with 100000 atoms on the K computer. In Proc. the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2008, Article No. 1. DOI: 10.1145/2063384.2063386.
    [14]
    Nakata A, Baker J S, Mujahed S Y, Poulton J T L, Arapan S, Lin J B, Raza Z, Yadav S, Truflandier L, Miyazaki T, Bowler D R. Large scale and linear scaling DFT with the CONQUEST code. The Journal of Chemical Physics, 2020, 152(16): 164112. DOI: 10.1063/5.0005074.
    [15]
    Kühne T D, Iannuzzi M, Del Ben M, Rybkin V V, Seewald P, Stein F, Laino T, Khaliullin R Z, Schütt O, Schiffmann F, Golze D, Wilhelm J, Chulkov S, Bani-Hashemian M H, Weber V, Borštnik U, Taillefumier M, Jakobovits A S, Lazzaro A, Pabst H, Müller T, Schade R, Guidon M, Andermatt S, Holmberg N, Schenter G K, Hehn A, Bussy A, Belleflamme F, Tabacchi G, Glöß A, Lass M, Bethune I, Mundy C J, Plessl C, Watkins M, Vandevondele J, Krack M, Hutter J. CP2K: An electronic structure and molecular dynamics software package-quickstep: Efficient and accurate electronic structure calculations. The Journal of Chemical Physics, 2020, 152(19): 194103. DOI: 10.1063/5.0007045.
    [16]
    Shang H H, Li F, Zhang Y Q, Zhang L B, Fu Y, Gao Y X, Wu Y J, Duan X H, Lin R F, Liu X, Liu Y, Chen D X. Extreme-scale ab initio quantum Raman spectra simulations on the leadership HPC system in China. In Proc. the 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2021, Article No. 6. DOI: 10.1145/3458817.3487402.
    [17]
    Hu W, Qin X M, Jiang Q C, Chen J S, An H, Jia W L, Li F, Liu X, Chen D X, Liu F F, Zhao Y W, Yang J L. High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight. Science Bulletin, 2021, 66(2): 111–119. DOI: 10.1016/j.scib.2020.06.025.
    [18]
    Schade R, Kenter T, Elgabarty H, Lass M, Schütt O, Lazzaro A, Pabst H, Mohr S, Hutter J, Kühne T D, Plessl C. Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms. Parallel Computing, 2022, 111: 102920. DOI: 10.1016/j.parco.2022.102920.
    [19]
    Hu W, An H, Guo Z Q, Jiang Q C, Qin X M, Chen J S, Jia W L, Yang C, Luo Z L, Li J L, Wu W T, Tan G M, Jia D N, Lu Q L, Liu F F, Tian M, Li F, Huang Y Q, Wang L Y, Liu S, Yang J L. 2.5 million-atom ab initio electronic-structure simulation of complex metallic heterostructures with DGDFT. In Proc. the 2022 International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2022, Article No. 5. DOI: 10.1109/SC41404.2022.00010.
    [20]
    Goedecker S. Linear scaling electronic structure methods. Reviews of Modern Physics, 1999, 71(4): 1085–1123. DOI: 10.1103/RevModPhys.71.1085.
    [21]
    Lin L, Lu J F, Car R, E W N. Multipole representation of the Fermi operator with application to the electronic structure analysis of metallic systems. Physical Review B, 2009, 79(11): 115133. DOI: 10.1103/PhysRevB.79.115133.
    [22]
    Bowler D R, Miyazaki T. O( N) methods in electronic structure calculations. Reports on Progress in Physics, 2012, 75(3): 036503. DOI: 10.1088/0034-4885/75/3/036503.
    [23]
    Wang L W, Zhao Z J, Meza J. Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations. Physical Review B, 2008, 77(16): 165113. DOI: 10.1103/PhysRevB.77.165113.
    [24]
    Ye M, Jiang X W, Li S S, Wang L W. Large-scale ab initio quantum transport simulation of nanosized copper interconnects: The effects of defects and quantum interferences. In Proc. the 2019 IEEE International Electron Devices Meeting (IEDM), Dec. 2019, Article No. 24. DOI: 10.1109/IEDM19573.2019.8993549.
    [25]
    Wang L W, Jia W L, Cao Z Y, Wang L, Chi X B, Gao W G. GPU speedup of the plane wave pseudopotential density functional theory calculations. In APS March Meeting Abstracts, Feb. 27–March 2, 2012, Abstract ID T7.008.
    [26]
    Tomo S, Langou J, Dongarra J, Canning A, Wang L W. Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures. International Journal of Computational Science and Engineering, 2006, 2(3/4): 205–212. DOI: 10.1504/IJCSE.2006.012774.
    [27]
    Kohn W. Density functional and density matrix method scaling linearly with the number of atoms. Physical Review Letters, 1996, 76(17): 3168–3171. DOI: 10.1103/PhysRevLett.76.3168.
    [28]
    Auckenthaler T, Blum V, Bungartz H J, Huckle T, Johanni R, Krämer L, Lang B, Lederer H, Willems P R. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Computing, 2011, 37(12): 783–794. DOI: 10.1016/j.parco.2011.05.002.
    [29]
    Yang C, Meza J C, Wang L W. A trust region direct constrained minimization algorithm for the Kohn–Sham equation. SIAM Journal on Scientific Computing, 2007, 29(5): 1854–1875. DOI: 10.1137/060661442.
    [30]
    Vecharynski E, Yang C, Pask J E. A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix. Journal of Computational Physics, 2015, 290: 73–89. DOI: 10.1016/j.jcp.2015.02.030.
    [31]
    Knyazev A V. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM Journal on Scientific Computing, 2001, 23(2): 517–541. DOI: 10.1137/S1064827500366124.
    [32]
    Jia W L, Cao Z Y, Wang L, Fu J Y, Chi X B, Gao W G, Wang L W. The analysis of a plane wave pseudopotential density functional theory code on a GPU machine. Computer Physics Communications, 2013, 184(1): 9–18. DOI: 10.1016/j.cpc.2012.08.002.
    [33]
    Fattebert J L, Osei-Kuffuor D, Draeger E W, Ogitsu T, Krauss W D. Modeling dilute solutions using first-principles molecular dynamics: Computing more than a million atoms with over a million cores. In Proc. the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2016, pp.12–22. DOI: 10.1109/SC.2016.88.
    [34]
    Higham N J. Accuracy and Stability of Numerical Algorithms. SIAM, 2002.
    [35]
    Sun J G. Matrix Perturbation Analysis (2nd edition). Science Press, 2001. (in Chinese)
    [36]
    Hohenberg P, Kohn W. Inhomogeneous electron gas. Physical Review, 1964, 136(3B): B864–B871. DOI: 10.1103/ PhysRev.136.B864.
    [37]
    Gabriel E, Fagg G, Bosilca G et al. Open MPI: Goals, concept, design of a next generation MPI implementation. In Proc. the 11th European PVM/MPI Users' Group Meeting, Sept. 2004, pp.97–104. DOI: 10.1109/CLUSTR.2006.311904.
    [38]
    Van Zee F G, van de Geijn R A. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Trans. Mathematical Software, 2015, 41(3): Article No. 14. DOI: 10.1145/2764 454.
    [39]
    Anderson E, Bai Z, Bischof C, Blackford L S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D. LAPACK Users’ Guide (3rd edition). Society for Industrial and Applied Mathematics, 1999.
    [40]
    Blackford L S, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley R C. ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, 1997.
    [41]
    Bosma W, Cannon J, Playoust C. The Magma algebra system I: The user language. Journal of Symbolic Computation, 1997, 24(3/4): 235–265. DOI: 10.1006/jsco.1996.0125.
  • Related Articles

    [1]Zi-Xuan Ma, Yu-Yang Jin, Shi-Zhi Tang, Hao-Jie Wang, Wei-Cheng Xue, Ji-Dong Zhai, Wei-Min Zheng. Unified Programming Models for Heterogeneous High-Performance Computers[J]. Journal of Computer Science and Technology, 2023, 38(1): 211-218. DOI: 10.1007/s11390-023-2888-4
    [2]Rong Ge, Xizhou Feng, Pengfei Zou, Tyler Allen. The Paradigm of Power Bounded High-Performance Computing[J]. Journal of Computer Science and Technology, 2023, 38(1): 87-102. DOI: 10.1007/s11390-023-2885-7
    [3]Robert B. Ross, George Amvrosiadis, Philip Carns, Charles D. Cranor, Matthieu Dorier, Kevin Harms, Greg Ganger, Garth Gibson, Samuel K. Gutierrez, Robert Latham, Bob Robey, Dana Robinson, Bradley Settlemyer, Galen Shipman, Shane Snyder, Jerome Soumagne, Qing Zheng. Mochi: Composing Data Services for High-Performance Computing Environments[J]. Journal of Computer Science and Technology, 2020, 35(1): 121-144. DOI: 10.1007/s11390-020-9802-0
    [4]André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. Ad Hoc File Systems for High-Performance Computing[J]. Journal of Computer Science and Technology, 2020, 35(1): 4-26. DOI: 10.1007/s11390-020-9801-1
    [5]Xu Tan, Xiao-Wei Shen, Xiao-Chun Ye, Da Wang, Dong-Rui Fan, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Zhi-Min Tang. A Non-Stop Double Buffering Mechanism for Dataflow Architecture[J]. Journal of Computer Science and Technology, 2018, 33(1): 145-157. DOI: 10.1007/s11390-017-1747-6
    [6]Xiao-Wei Shen, Xiao-Chun Ye, Xu Tan, Da Wang, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Dong-Rui Fan, Ning-Hui Sun. An Efficient Network-on-Chip Router for Dataflow Architecture[J]. Journal of Computer Science and Technology, 2017, 32(1): 11-25. DOI: 10.1007/s11390-017-1703-5
    [7]Giovanni Iacca, Fabio Caraffini, Ferrante Neri. Compact Differential Evolution Light: High Performance Despite Limited Memory Requirement and Modest Computational Overhead[J]. Journal of Computer Science and Technology, 2012, 27(5): 1056-1076. DOI: 10.1007/s11390-012-1284-2
    [8]Jie Yang, Mohammed Al-Rawi. Illumination Invariant Recognition of Three-Dimensional Texture in Color Images[J]. Journal of Computer Science and Technology, 2005, 20(3): 378-388.
    [9]Dong Feng, Cai Wenli, Chen Tianzhou, Shi Jiaoying. Three-Dimensional Volume Datafield Reconstruction from Physical Model[J]. Journal of Computer Science and Technology, 1997, 12(3): 217-230.
    [10]Liang Xundong, Li Bin, Liu Shenquan. Three-Dimensional Vector Field Visualization Based on Tensor Decomposition[J]. Journal of Computer Science and Technology, 1996, 11(5): 452-460.
  • Cited by

    Periodical cited type(4)

    1. Xun Wang, Xiangyu Meng, Zhuoqiang Guo, et al. 29-Billion Atoms Molecular Dynamics Simulation With Ab Initio Accuracy on 35 Million Cores of New Sunway Supercomputer. IEEE Transactions on Computers, 2025, 74(5): 1634. DOI:10.1109/TC.2025.3540646
    2. Xiangyu Meng, Xun Wang, Mingzhen Li, et al. An interpretable DeePMD-kit performance model for emerging supercomputers. CCF Transactions on High Performance Computing, 2025. DOI:10.1007/s42514-024-00209-8
    3. Ji Qi, Huimin Zhang, Dezun Shan, et al. Accelerating hartree-fock self-consistent field calculation on C86/DCU heterogenous computing platform. Chinese Journal of Chemical Physics, 2025, 38(1): 81. DOI:10.1063/1674-0068/cjcp2403028
    4. Jianxiong Li, Boyang Li, Zhuoqiang Guo, et al. Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day. SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, DOI:10.1109/SC41406.2024.00036

    Other cited types(0)

Catalog

    Article views (547) PDF downloads (130) Cited by(4)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return