Cholesky Parallel Decomposition Optimization Algorithm Based on ScaLAPACK
-
Abstract
The Scalable Linear Algebra Package (ScaLAPACK) is a critical library for parallel computing on distributed-memory systems, enabling the development of numerous scientific applications that depend on robust linear algebra operations. However, for specific computations such as the Cholesky decomposition, the native ScaLAPACK routines are not communication-optimal and fail to fully leverage the capabilities of modern parallel architectures. This paper proposes a Parallel Cholesky Factorisation (PCF) optimization algorithm designed to address these limitations within the ScaLAPACK framework. The PCF algorithm enhances performance and load balancing by strategically differentiating data partitions across processes. It involves a temporary redistribution of computational workloads to a root process, which performs a concentrated calculation before redistributing the results. This approach ensures a more balanced utilization of CPU resources. Experimental evaluation was conducted on both Intel and Kunpeng processor platforms. The first set of experiments demonstrates that the PCF algorithm achieves an average performance improvement of 30% over the native ScaLAPACK algorithm on the Intel platform under optimized process and thread configurations. A second comparative experiment shows that the PCF algorithm on the Kunpeng platform achieves an average computational efficiency increase of 200% and 30% under different thread configurations, significantly outperforming the Intel math kernel library (MKL) library on the Intel platform. These results confirm that the proposed optimization effectively enhances performance and portability across diverse modern computing environments.
-
-