Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1051-1070.doi: 10.1007/s11390-021-1242-y

Special Issue: Computer Architecture and Systems; Computer Networks and Distributed Computing

• Special Section of APPT 2021 (Part 1) • Previous Articles     Next Articles

FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Yu-Wei Wu, Student Member, CCF, Qing-Gang Wang, Student Member, CCF Long Zheng*, Member, CCF, ACM, IEEE, Xiao-Fei Liao, Senior Member, CCF, Member, IEEE Hai Jin, Fellow, CCF, IEEE, Member, ACM, Wen-Bin Jiang, Member, CCF, ACM, IEEE Ran Zheng, Member, CCF, ACM, IEEE, and Kan Hu, Member, CCF, ACM, IEEE        

  1. National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China;Services Computing Technology and System Laboratory, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China;Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2020-12-30 Revised:2021-08-06 Online:2021-09-30 Published:2021-09-30
  • Supported by:
    This work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1003502, and the National Natural Science Foundation of China under Grant Nos. 62072195, 61825202, 61832006, and 61628204.

With the rapid growth of real-world graphs, the size of which can easily exceed the on-chip (board) storage capacity of an accelerator, processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult. The multi-FPGA acceleration is of great necessity and importance. Many cloud providers (e.g., Amazon, Microsoft, and Baidu) now expose FPGAs to users in their data centers, providing opportunities to accelerate large-scale graph processing. In this paper, we present a communication library, called FDGLib, which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center, with minimal hardware engineering efforts. FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications. Considering the torus-based FPGA interconnection in data centers, FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes. We interface FDGLib into AccuGraph, a state-of-the-art graph accelerator. Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini, with better scalability.

Key words: data center; accelerator; graph processing; distributed architecture; communication optimization;

