HI-SM4: Exploiting Domestic Hygon GPU for High-Performance SM4 Cryptographic Computing
-
Abstract
With the rapid rise of domestic accelerators and the growing demand for efficient cryptographic algorithm
implementations, optimizing the performance of domestic commercial cryptography on homegrown hardware platforms has
become a critical research challenge. This paper focuses on utilizing the Chinese Hygon DCU as an acceleration platform
and proposes a high-performance SM4 encryption implementation. First, we design a collaborative CPU/DCU computing
framework, where the CPU handles task scheduling while the DCU performs parallel execution of computationally intensive
tasks. Additionally, we leverage the DCU’s data copy engine and employ HIP stream-based data transmission to enhance
computational efficiency. Second, we propose an optimized hierarchical memory access strategy by merging multiple sets
of computational data for unified transfer and employing a hierarchical storage scheme based on data characteristics to
improve memory access efficiency. Furthermore, we employ GCN ISA inline assembly to achieve constant-time computation,
which enhances both security and computational performance. Finally, we support both single-key and multi-key encryption
modes, optimizing memory access and parallel computation for each. The experimental results show that our optimized
SM4 implementation achieves a throughput of 161.81 Gbps in the multi-key encryption mode and up to 730.88 Gbps in the
single-key encryption mode. Compared with the best existing SM4 implementations on CPU, FPGA, and GPU platforms,
our approach achieves performance improvements of 28.22×, 4.50×, and 1.36×, respectively. Our research presents a feasible
framework for efficiently implementing indigenous cryptographic algorithms on domestically developed hardware accelerators,
thereby contributing to the advancement of high-performance cryptographic computation.
-
-