We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Yi-Fan Zhang, Lei Sun, Qiang Cao. TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis[J]. Journal of Computer Science and Technology, 2022, 37(6): 1290-1306. DOI: 10.1007/s11390-022-1499-9
Citation: Yi-Fan Zhang, Lei Sun, Qiang Cao. TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis[J]. Journal of Computer Science and Technology, 2022, 37(6): 1290-1306. DOI: 10.1007/s11390-022-1499-9

TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis

Funds: This work was partially supported by the National Key Research and Development Program of China under Grant No. 2018YF-A0701800, the National Natural Science Foundation of China under Grant Nos. 61821003 and 62172175, and Alibaba Group through Alibaba Innovative Research (AIR) Program.
More Information
  • Author Bio:

    Qiang Cao is currently a full professor of the Wuhan National Laboratory for Optoelectronics, Wuhan, and Huazhong University of Science and Technology, Wuhan. His research interests include computer architecture, large-scale storage systems, and performance evaluation. He is a distinguished member of CCF, a senior member of IEEE, and a member of ACM.

  • Corresponding author:

    Qiang Cao E-mail: caoqiang@hust.edu.cn

  • Received Date: April 05, 2021
  • Revised Date: March 01, 2022
  • Accepted Date: April 11, 2022
  • Published Date: December 08, 2022
  • Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, a mid-level multi-unit decoding core, and a high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.
  • [1]
    Pratas F, Andrade J, Falcao G, Silva V, Sousa L. Open the gates: Using high-level synthesis towards programmable LDPC decoders on FPGAs. In Proc. the 2013 IEEE Global Conference on Signal and Information Processing, Dec. 2013, pp.1274-1277. DOI:  10.1109/GlobalSIP.2013.6737141.
    [2]
    Mhaske S, Kee H, Ly T, Aziz A, Spasojevic P. FPGA-based channel coding architectures for 5G wireless using high-level synthesis. International Journal of Reconfigurable Computing, 2017, 2017: Article No. 3689308. DOI:  10.1155/2017/3689308.
    [3]
    Zhang M, Wu F, Yu Q, Liu W, Cui L, Zhao Y, Xie C. BeLDPC: Bit errors aware adaptive rate LDPC codes for 3D TLC NAND flash memory. In Proc. the 2020 Design, Automation and Test in Europe Conference and Exhibition, March 2020, pp.302-305. DOI:  10.23919/DATE48585.2020.9116324.
    [4]
    Andrade J, George N, Karras K, Novo D, Pratas F, Sousa L, Ienne P, Falcao G, Silva V. Design space exploration of LDPC decoders using high-level synthesis. IEEE Access, 2017, 5: 14600-14615. DOI:  10.1109/ACCESS.2017.2727221.
    [5]
    Andrade J, Pratas F, Falcao G, Silva V, Sousa L. Combining flexibility with low power: Dataflow and wide-pipeline LDPC decoding engines in the Gbit/s era. In Proc. the 2014 IEEE International Conference on Application-Specific Systems, Architectures and Processors, June 2014, pp.264-269. DOI:  10.1109/ASAP.2014.6868671.
    [6]
    Andrade J, Falcao G, Silva V. Flexible design of wide-pipeline-based WiMAX QC-LDPC decoder architectures on FPGAs using high-level synthesis. Electronics Letters, 2014, 50(11): 839-840. DOI:  10.1049/el.2013.3411.
    [7]
    Hailes P, Xu L, Maunder R G, Al-Hashimi B M, Hanzo L. A survey of FPGA-based LDPC decoders. IEEE Communications Surveys and Tutorials, 2016, 18(2): 1098-1122. DOI:  10.1109/COMST.2015.2510381.
    [8]
    Gallager R. Low-density parity-check codes. IRE Transactions on Information Theory, 1962, 8(1): 21-28. DOI:  10.1109/TIT.1962.1057683.
    [9]
    Nane R, Sima V M, Pilato C, Choi J, Fort B, Canis A, Chen Y T, Hsiao H, Brown S, Ferrandi F, Anderson J, Bertels K. A survey and evaluation of FPGA high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016, 35(10): 1591-1604. DOI:  10.1109/TCAD.2015.2513673.
    [10]
    Chandrasetty V A, Aziz S M S. FPGA implementation of high performance LDPC decoder using modified 2-bit Min-Sum algorithm. In Proc. the 2nd International Conference on Computer Research and Development, May 2010, pp.881-885. DOI:  10.1109/ICCRD.2010.186.
    [11]
    Chandrasetty V A, Aziz S M. An area efficient LDPC decoder using a reduced complexity Min-Sum algorithm. Integration, 2012, 45(2): 141-148. DOI:  10.1016/j.vlsi.2011.08.002.
    [12]
    Zarubica R, Wilson S G, Hall E. Multi-Gbps FPGA-based low density parity check (LDPC) decoder design. In Proc. the 2007 IEEE Global Telecommunications Conference, Nov. 2007, pp.548-552. DOI:  10.1109/GLOCOM.2007.108.
    [13]
    Townsend R, Weldon E. Self-orthogonal quasi-cyclic codes. IEEE Transactions on Information Theory, 1967, 13(2): 183-195. DOI:  10.1109/TIT.1967.1053974.
    [14]
    Choi Y K, Chi Y, Qiao W, Samardzic N, Cong J. HBM connect: High-performance HLS interconnect for FPGA HBM. In Proc. the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 28-Mar. 2, 2021, pp.116-126. DOI:  10.1145/3431920.3439301.
    [15]
    Le Gal B, Jégo C. Low-latency software LDPC decoders for x86 multi-core devices. In Proc. the 2017 IEEE International Workshop on Signal Processing Systems, Oct. 2017. DOI:  10.1109/SiPS.2017.8110001.
    [16]
    Yuan J, Sha J. 4.7-Gb/s LDPC decoder on GPU. IEEE Communications Letters, 2018, 22(3): 478-481. DOI:  10.1109/LCOMM.2017.2778727.
    [17]
    Wen X, Jiao X J, Jääskeläinen P, Kultala H, Chen C F, Berg H, Bie Z S. A high throughput LDPC decoder using a mid-range GPU. In Proc. the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2014, pp.7515-7519. DOI:  10.1109/ICASSP.2014.6855061.
    [18]
    Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In Proc. the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, April 30-May 2, 2017, pp.152-159. DOI:  10.1109/FCCM.2017.25.
    [19]
    Zhang X, Liu X, Ramachandran A, Zhuge C, Tang S, Ouyang P, Cheng Z, Rupnow K, Chen D. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Proc. the 27th International Conference on Field Programmable Logic and Applications, Sept. 2017. DOI:  10.23919/FPL.2017.8056833.
    [20]
    Chen X, Tan H, Chen Y, He B, Wong W F, Chen D. ThunderGP: HLS-based graph processing framework on FPGAs. In Proc. the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 28-Mar. 2, 2021, pp.69-80. DOI:  10.1145/3431920.3439290.
    [21]
    Zhang K, Huang X, Wang Z. High-throughput layered decoder implementation for quasi-cyclic LDPC codes. IEEE Journal on Selected Areas in Communications, 2009, 27(6): 985-994. DOI:  10.1109/JSAC.2009.090816.
  • Related Articles

    [1]Lan Huang, Da-Lin Li, Kang-Ping Wang, Teng Gao, Adriano Tavares. A Survey on Performance Optimization of High-Level Synthesis Tools[J]. Journal of Computer Science and Technology, 2020, 35(3): 697-720. DOI: 10.1007/s11390-020-9414-8
    [2]Shu-Quan Wang, Lei Wang, Yu Deng, Zhi-Jie Yang, Sha-Sha Guo, Zi-Yang Kang, Yu-Feng Guo, Wei-Xia Xu. SIES: A Novel Implementation of Spiking Convolutional Neural Network Inference Engine on Field-Programmable Gate Array[J]. Journal of Computer Science and Technology, 2020, 35(2): 475-489. DOI: 10.1007/s11390-020-9686-z
    [3]Ji-Liang Zhang, Wei-Zheng Wang, Xing-Wei Wang, Zhi-Hua Xia. Enhancing Security of FPGA-Based Embedded Systems with Combinational Logic Binding[J]. Journal of Computer Science and Technology, 2017, 32(2): 329-339. DOI: 10.1007/s11390-017-1700-8
    [4]Ji-Liang Zhang, Qiang Wu, Yi-Peng Ding, Yong-Qiang Lv, Qiang Zhou, Zhi-Hua Xia, Xing-Ming Sun, Xing-Wei Wang. Techniques for Design and Implementation of an FPGA-Specific Physical Unclonable Function[J]. Journal of Computer Science and Technology, 2016, 31(1): 124-136. DOI: 10.1007/s11390-016-1616-8
    [5]Hui Dai, Qiang Zhou, Ji-Nian Bian. Multilevel Optimization for Large-Scale Hierarchical FPGA Placement[J]. Journal of Computer Science and Technology, 2010, 25(5): 1083-1091. DOI: 10.1007/s11390-010-1085-4
    [6]Ehsan Atoofian, Zainalabedin Navabi. A Test Approach for Look-Up Table Based FPGAs[J]. Journal of Computer Science and Technology, 2006, 21(1): 141-146.
    [7]LI Xiaowei, Paul Y.S. Cheung. High Level Synthesis for Loop-Based BIST[J]. Journal of Computer Science and Technology, 2000, 15(4): 338-345.
    [8]LIU Minced, ZHANG Dongniao, XU Qingning. Technical Decisions on Several Key Problems in VHDL High Level Synthesis System[J]. Journal of Computer Science and Technology, 1999, 14(6): 565-571.
    [9]Yan Zongfu, Liu Mingye. The RTL Binding and Mapping Approach of VHDL High-Level Synthesis System HLS/BIT[J]. Journal of Computer Science and Technology, 1996, 11(6): 562-569.
    [10]Tang Zhimin, Xia Peisu. A Maximum Time Difference Pipelined Arithmetic Unit Based on CMOS Gate Array[J]. Journal of Computer Science and Technology, 1995, 10(2): 97-103.
  • Others

  • Cited by

    Periodical cited type(3)

    1. Chunru Xiong, Qiang Li. Optimization of LDPC decoding algorithm in semi-conductor storage based on artificial neural networks. International Journal of Electronics, 2025. DOI:10.1080/00207217.2025.2450740
    2. Seline Löwe, Marcel Koch, Thomas Schäffer. Analyse und Bewältigung von Herausforderungen im Software-Prototyping: Eine Untersuchung der Schlüsseldimensionen zur Unterstützung der Digitalen Transformation. HMD Praxis der Wirtschaftsinformatik, 2025. DOI:10.1365/s40702-024-01137-5
    3. Shuangye Yang, Zhiwei Zhang, Hui Xia, et al. Edge Intelligence-Assisted Asymmetrical Network Control and Video Decoding in the Industrial IoT with Speculative Parallelization. Symmetry, 2023, 15(8): 1516. DOI:10.3390/sym15081516

    Other cited types(0)

Catalog

    Article views (158) PDF downloads (67) Cited by(3)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return