A Unified Co-Processor Architecture for Matrix Decomposition

Yong Dou; Jie Zhou; Gui-Ming Wu; Jing-Fei Jiang; Yuan-Wu Lei; Shi-Ce Ni

doi:10.1007/s11390-010-1068-5

Yong Dou, Jie Zhou, Gui-Ming Wu, Jing-Fei Jiang, Yuan-Wu Lei, Shi-Ce Ni. A Unified Co-Processor Architecture for Matrix DecompositionJ. Journal of Computer Science and Technology, 2010, 25(4): 874-885. DOI: 10.1007/s11390-010-1068-5

Citation:

A Unified Co-Processor Architecture for Matrix Decomposition

Abstract

Abstract

QR and LU decompositions are the most important matrix decomposition algorithms. Many studies work on accelerating these algorithms by FPGA or ASIC in a case by case style. In this paper, we propose a unified framework for the matrix decomposition algorithms, combining three QR decomposition algorithms and LU algorithm with pivoting into a unified linear array structure. The QR and LU decomposition algorithms exhibit the same two-level loop structure and the same data dependency. Utilizing the similarities in loop structure and data dependency of matrix decomposition, we unify a fine-grained algorithm for all four matrix decomposition algorithms. Furthermore, we present a unified co-processor structure with a scalable linear array of processing elements (PEs), in which four types of PEs are same in the structure of memory channels and PE connections, but the only difference exists in the internal structure of data path. Our unified co-processor, which is IEEE 32-bit floating-point precision, is implemented and mapped onto a Xilinx Virtex5 FPGA chip. Experimental results show that our co-processors can achieve speedup of 2.3 to 14.9 factors compared to a Pentium Dual CPU with double SSE threads.

FullText(HTML)

References (28)

Relative Articles

Supplements (0)

Cited By

A Unified Co-Processor Architecture for Matrix Decomposition

Abstract

Catalog

Export File

Citation

Format

Content