HI-SM3: High-Performance Implementation of SM3 Hash Function on Heterogeneous GPUs
-
Abstract
Hash functions are essential in cryptographic primitives such as digital signatures, key exchanges, and blockchain technology. SM3, built upon the Merkle-Damgard structure, is a crucial element in Chinese commercial cryptographic schemes. Optimizing hash function performance is crucial given the growth of Internet of Things (IoT) devices and the rapid evolution of blockchain technology. In this paper, we introduce a high-performance implementation framework for accelerating the SM3 cryptography hash function, short for HI-SM3, using heterogeneous GPU (graphics processing unit) parallel computing devices. HI-SM3 enhances the implementation of hash functions across four dimensions: parallelism, register utilization, memory access, and instruction efficiency, resulting in significant performance gains across various GPU platforms. Leveraging the NVIDIA RTX 4090 GPU, HI-SM3 achieves a remarkable peak performance of 454.74 GB/s, surpassing OpenSSL on a high-end server CPU (E5-2699V3) with 16 cores by over 150 times. On the Hygon DCU accelerator, a Chinese domestic graphics card, it achieves 113.77 GB/s. Furthermore, compared with the fastest known GPU-based SM3 implementation, HI-SM3 on the same GPU platform exhibits a 3.12x performance improvement. Even on embedded GPUs consuming less than 40W, HI-SM3 attains a throughput of 5.90 GB/s, which is twice as high as that of a server-level CPU. In summary, HI-SM3 provides a significant performance advantage, positioning it as a compelling solution for accelerating hash operations.
-
-