We use cookies to improve your experience with our site.
Yong Dou, Jie Zhou, Gui-Ming Wu, Jing-Fei Jiang, Yuan-Wu Lei, Shi-Ce Ni. A Unified Co-Processor Architecture for Matrix Decomposition[J]. Journal of Computer Science and Technology, 2010, 25(4): 874-885. DOI: 10.1007/s11390-010-1068-5
Citation: Yong Dou, Jie Zhou, Gui-Ming Wu, Jing-Fei Jiang, Yuan-Wu Lei, Shi-Ce Ni. A Unified Co-Processor Architecture for Matrix Decomposition[J]. Journal of Computer Science and Technology, 2010, 25(4): 874-885. DOI: 10.1007/s11390-010-1068-5

A Unified Co-Processor Architecture for Matrix Decomposition

  • QR and LU decompositions are the most important matrix decomposition algorithms. Many studies work on accelerating these algorithms by FPGA or ASIC in a case by case style. In this paper, we propose a unified framework for the matrix decomposition algorithms, combining three QR decomposition algorithms and LU algorithm with pivoting into a unified linear array structure. The QR and LU decomposition algorithms exhibit the same two-level loop structure and the same data dependency. Utilizing the similarities in loop structure and data dependency of matrix decomposition, we unify a fine-grained algorithm for all four matrix decomposition algorithms. Furthermore, we present a unified co-processor structure with a scalable linear array of processing elements (PEs), in which four types of PEs are same in the structure of memory channels and PE connections, but the only difference exists in the internal structure of data path. Our unified co-processor, which is IEEE 32-bit floating-point precision, is implemented and mapped onto a Xilinx Virtex5 FPGA chip. Experimental results show that our co-processors can achieve speedup of 2.3 to 14.9 factors compared to a Pentium Dual CPU with double SSE threads.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return