We use cookies to improve your experience with our site.
Fu-Yuan Chen, Jian-Kuo Dong, Xing-Yu Wang, Zhen-Jiang Dong, Hua-Qun Wang. HI-SM4: Exploiting Domestic Hygon GPU for High-Performance SM4 Cryptographic ComputingJ. Journal of Computer Science and Technology. DOI: 10.1007/s11390-026-5659-1
Citation: Fu-Yuan Chen, Jian-Kuo Dong, Xing-Yu Wang, Zhen-Jiang Dong, Hua-Qun Wang. HI-SM4: Exploiting Domestic Hygon GPU for High-Performance SM4 Cryptographic ComputingJ. Journal of Computer Science and Technology. DOI: 10.1007/s11390-026-5659-1

HI-SM4: Exploiting Domestic Hygon GPU for High-Performance SM4 Cryptographic Computing

  • With the rapid rise of domestic accelerators and the growing demand for efficient cryptographic algorithm implementations, optimizing the performance of domestic commercial cryptography on homegrown hardware platforms has become a critical research challenge. This paper focuses on utilizing the Chinese Hygon DCU as an acceleration platform and proposes a high-performance SM4 encryption implementation. First, we design a collaborative CPU/DCU computing framework, where the CPU handles task scheduling while the DCU performs parallel execution of computationally intensive tasks. Additionally, we leverage the DCU’s data copy engine and employ HIP stream-based data transmission to enhance computational efficiency. Second, we propose an optimized hierarchical memory access strategy by merging multiple sets of computational data for unified transfer and employing a hierarchical storage scheme based on data characteristics to improve memory access efficiency. Furthermore, we employ GCN ISA inline assembly to achieve constant-time computation, which enhances both security and computational performance. Finally, we support both single-key and multi-key encryption modes, optimizing memory access and parallel computation for each. The experimental results show that our optimized SM4 implementation achieves a throughput of 161.81 Gbps in the multi-key encryption mode and up to 730.88 Gbps in the single-key encryption mode. Compared with the best existing SM4 implementations on CPU, FPGA, and GPU platforms, our approach achieves performance improvements of 28.22×, 4.50×, and 1.36×, respectively. Our research presents a feasible framework for efficiently implementing indigenous cryptographic algorithms on domestically developed hardware accelerators, thereby contributing to the advancement of high-performance cryptographic computation.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return