›› 2011, Vol. 26 ›› Issue (3): 434-447.doi: 10.1007/s11390-011-1145-4

Special Issue: Computer Architecture and Systems

• Special Section on High-Performance Computing for Embedded Multi-Core Systems • Previous Articles     Next Articles

A Resource-Efficient Communication Architecture for Chip Multiprocessors on FPGAs

Xiaofang (Maggie) Wang, Member, IEEE, and Swetha Thota   

  1. Department of Electrical and Computer Engineering, Villanova University, Villanova, PA 19085, U.S.A.
  • Received:2010-02-25 Revised:2011-03-14 Online:2011-05-05 Published:2011-05-05

Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network. The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications.

[1] Cosoroaba A, Rivoallon F. Achieving higher system performance with Virtex-5 family FPGAs. Xilinx Corporation, Tech. Rep., 2006.

[2] Virtex 5 FPGA datasheet. http://www.xilinx.com/support/documentation/data_sheets/ds202.pdf, May 2010.

[3] Underwood K. FPGAs vs. CPUs: Trends in peak floatingpoint performance. In Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, Monterey, USA, Feb. 22- 24, 2004, pp.171-180.

[4] deLorimier M, DeHon A. Floating-point sparse matrix-vector multiply for FPGAs. In Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Monterey, USA, Feb. 20- 22, 2005, pp.75-85.

[5] Hauck S, DeHon A (Eds.). Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation. Burlington: Morgan Kaufmann, MA, 2008.

[6] El-Ghazawi T, El-Araby E, Huang M, Gaj K, Kindratenko V, Buell D. The promise of high-performance reconfigurable computing. IEEE Computer, Feb. 2008, 41(2): 69-76.

[7] Zhuo L, Prasanna V. Scalable hybrid designs for linear algebra on reconfigurable computing systems. IEEE Trans. Comput., Dec. 2008, 57(12): 1661-1675.

[8] Ravindran K, Satish N R, Jin Y, Keutzer K. An FPGA-based soft multiprocessor system for IPv4 packet forwarding. In Proc. Int. Conf. Field Programmable Logic and Applications (FPL), Tampere, Finland, Aug. 24-26, 2005, pp.487-492.

[9] Saint-Jean N, Sassatelli G, Benoit P, Torres L, Robert M. HSScale: A hardware-software scalable MP-SOC architecture for embedded systems. In Proc. IEEE Computer Society Annual Symp. VLSI (ISVLSI), Porto Alegre, Brazil, May 9-11, 2007, pp.21-28.

[10] Wang X, Ziavras S G. Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration. IEE Proc. Computers Digital Techniques, July 2006, 153(4): 249-260.

[11] Kumar S et al. A network on chip architecture and design methodology. In Proc. IEEE Computer Society Annual Symp. VLSI (ISVLSI), Pittsburgh, USA, Apr. 25-26, 2002, pp.105-112.

[12] Dally W, Seitz C. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput., May, 1987, 36(5): 547-553.

[13] Ni L, Mckinley P. A survey of wormhole routing techniques in direct networks. IEEE Computer, Feb. 1993, 26(2): 62-76.

[14] Bjerregaard T, Mahadevan S. A survey of research and practices of network-on-chip. ACM Computing Surveys, June 2006, 38(1): Article No. 1.

[15] Peh L S, Dally W. A delay model for router microarchitectures. IEEE Micro, Jan. 2001, 21(1): 26-34.

[16] Mullins R, West A, Moore S. Low-latency virtual-channel routers for on-chip networks. In Proc. IEEE Int. Symp. Computer Architecture, M¨unchen, Germany, Jun. 19-23, 2004, pp.188-197.

[17] Kapre N, Mehta N, Delorimier M, Rubin R, Barnor H, Wilson M, Wrighton M, Dehon A. Packet switched vs. time multiplexed FPGA overlay networks. In Proc. IEEE Symp. Field-Programmable Custom Computing Machines, Napa, USA, Apr. 24-26, 2006, pp.205-216.

[18] Gratz P, Sankaralingam K, Hanson H, Shivakumar P, McDonald R, Keckler S, Burger D. Implementation and evaluation of a dynamically routed processor operand network. In Proc. IEEE Int. Symp. Networks-on-Chip, Princeton, USA, May 7-9, 2007, pp.7-17.

[19] Schelle G, Grunwald D. Exploring FPGA network on chip implementations across various application and network loads. In Proc. Int. Conf. Field Program. Logic and Applications, Heidelberg, Germany, Sept. 8-10, 2008, pp.41-46.

[20] Moraes F, Calazans N, Mello A, Moller L, Ost L. HERMES: An infrastructure for low area overhead packet-switching networks on chip. Integration, the VLSI Journal, Oct. 2004, 38: 69-93.

[21] Brebner G, Levi D. Networking on chip with platform FPGAs. In Proc. IEEE Int. Conf. Field-Programmable Technology, Tokyo, Japan, Dec. 15-17, 2003, pp.13-20.

[22] Bartic T, Mignolet J Y et al. Topology adaptive networkon-chip design and implementation. IEE Proc. Computers Digital Techniques, July 2005, 152(4): 467-472.

[23] Sethuraman B, Bhattacharya P, Khan J, Vemuri R. LiPaR: A light-weight parallel router for FPGA-based networks-onchip. In Proc. ACM Great Lakes Symp. VLSI, Chicago, USA, Apr. 17-19, 2005, pp.452-457.

[24] Ogras U, Marculescu R, Lee H, Choudhary P, Marculescu D, Kaufman M, Nelson P. Challenges and promising results in NoC prototyping using FPGAs. IEEE Micro, Sept. 2007, 27(5): 86-95.

[25] Ngouanga A, Sassatelli G, Torres L, Gil T, Suarez A, Susin A. Run-time resources management on coarse grained, packetswitching reconfigurable architecture: A case study through the APACHES’ platform. In Proc. Int. Workshop on Applied Reconfigurable Computing (ARC), Delft, The Netherlands, Mar. 1-3, 2006, pp.134-145.

[26] Gratz P, Kim C, Mcdonald R, Keckler S W, Burger D. Implementation and evaluation of on-chip network architectures. In Proc. IEEE Int. Conf. Computer Design, San Jose, USA, Oct. 1-4, 2006, pp.477-484.

[27] ML505/ML506/ML507 evaluation platform user guide. http://www.xilinx.com/support/documentation/boards_and_kits/ug347.pdf, Oct. 7, 2009.

[28] Sassatelli G, Torres L, Riso S, Robert M. Packet-switching network-on-chip features exploration and characterization. In Proc. IFIP Int. Conf. Very Large Scale Integration, Madrid, Spain, Sept. 27-29, 2005, pp.403-409.
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved