Journal of Computer Science and Technology


TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis

Yi-fan Zhang1,2 (张一凡), Lei Sun1,2 (孙磊), and Qiang Cao1,2,* (曹强), Distinguished Member, CCF, Senior Member, IEEE, Member, ACM   

  1. 1School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    2Wuhan National Laboratory for Optoelectronics, Wuhan 430074, China
  • Received:2021-04-06 Revised:2022-03-02 Accepted:2022-04-12
  • Contact: Qiang Cao
  • About author:Qiang Cao is currently a full professor of the Wuhan National Laboratory for Optoelectronics, Wuhan, Huazhong University of Science and Technology, Wuhan. His research interests include computer architecture, large scale storage systems, and performance evaluation. He is a distinguished member of the china computer federation (CCF), a senior member of the IEEE, and a member of the ACM.

Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping Field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, mid-level multi-unit decoding core, and high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experiment results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.


本文实现最大吞吐率达到9.63Gbps,超过现有基于HLS的FPGA LDPC译码器实现,同时远超基于CPU和GPU的LDPC译码器实现。 本文提出了一种三层并行FPGA架构,用于使用HLS快速原型化高性能LDPC译码器,译码器实现了高达9.63Gbps的实测译码吞吐率,超过了基于HLS的FPGA LDPC现有工作的性能。由于本文提出的架构中的每一层都可以相对独立的优化,因此采用更好的算法或者调整上层并行设计可以获得更高的译码吞吐率与硬件效率。

Key words: low-density parity-check (LDPC); high-level synthesis (HLS); field-programmable gate array (FPGA);

[1] Lan Huang, Da-Lin Li, Kang-Ping Wang, Teng Gao, Adriano Tavares. A Survey on Performance Optimization of High-Level Synthesis Tools [J]. Journal of Computer Science and Technology, 2020, 35(3): 697-720.
[2] Shu-Quan Wang, Lei Wang, Yu Deng, Zhi-Jie Yang, Sha-Sha Guo, Zi-Yang Kang, Yu-Feng Guo, Wei-Xia Xu. SIES: A Novel Implementation of Spiking Convolutional Neural Network Inference Engine on Field-Programmable Gate Array [J]. Journal of Computer Science and Technology, 2020, 35(2): 475-489.
[3] Gabriel Falcão, Student Member, IEEE, Shinichi Yamagiwa, Member, IEEE, Vitor Silva, and Leonel Sousa, Member, ACM, Senior Member, IEEE. Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach [J]. , 2009, 24(5): 913-924.
[4] Shu-Tao Xia. A Note on the Stopping Redundancy of Linear Codes [J]. , 2006, 21(6): 950-951 .
Full text



No Suggested Reading articles found!

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved