›› 2015, Vol. 30 ›› Issue (2): 259-272.doi: 10.1007/s11390-015-1520-7

Special Issue: Computer Architecture and Systems; Computer Networks and Distributed Computing

• Special Section on Applications and Industry • Previous Articles     Next Articles

High Performance Interconnect Network for Tianhe System

Xiang-Ke Liao1,2(廖湘科), Fellow, CCF, Member, ACM, Zheng-Bin Pang1,2(庞征斌), Member, CCF, ACM, Ke-Fei Wang1(王克非), Yu-Tong Lu1,3(卢宇彤), Member, CCF, ACM, Min Xie1,3(谢旻), Jun Xia1(夏军), De-Zun Dong1,2(董德尊), Member, CCF, ACM, IEEE, Guang Suo1,3(所光)   

  1. 1 College of Computer, National University of Defense Technology, Changsha 410073, China;
    2 Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology Changsha 410073, China;
    3 State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073 China
  • Received:2014-11-30 Revised:2015-01-15 Online:2015-03-05 Published:2015-03-05
  • About author:Xiang-Ke Liao received his B.S. degree from Tsinghua University, Beijing, in 1985, and M.S. degree from National University of Defense Technology (NUDT), Changsha, in 1988, both in computer science. Currently he is a professor and the dean of College of Computer, NUDT. His research interests include high performance computing systems, operating systems, and parallel and distributed computing. Prof. Liao is a fellow of CCF.
  • Supported by:

    This work was partially supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A301 and the National Natural Science Foundation of China under Grant No. 61120106005.

In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.

[1] Liao X, Xiao L, Yang C et al. Milkyway-2 supercomputer system and application. Frontiers of Computer Science, 2014, 8(3): 345-356.

[2] Pritchard H, Gorodetsky I, Buntinas D. A uGNI-based MPICH2 Nemesis network module for the cray XE. In Proc. the 18th European MPI Users' Group Conference on Re-cent Advances in the Message Passing Interface, Sept. 2011, pp.110-119.

[3] Xie M, Lu Y, Liu L et al. Implementation and evaluation of network interface and message passing services for TianHe-1A supercomputer. In Proc. the 19th IEEE Annual Sympo-sium on High Performance Interconnects, Aug. 2011, pp.78-86.

[4] Kim J, Dally W J, Towles B, Gupta A K. Microarchitecture of a high radix router. In Proc. the 32nd Annual Inter-national Symposium on Computer Architecture, June 2005, pp.420-431.

[5] Schoinas I, Hill M D. Address translation mechanisms in network interfaces. In Proc. the 4th International Symposium on High-Performance Computer Architecture, Feb. 1998, pp.219-230.

[6] Chun B N, Mainwaring A, Culler D E. Virtual network transport protocols for Myrinet. IEEE Micro, 1998, 18(1): 53-63.

[7] Araki S, Bilas A, Dubnicki C et al. User-space communication: A quantitative study. In Proc. ACM/IEEE Conference on Supercomputing, Nov. 1998.

[8] Bhoedjang R A F, Ruhl T, Bal H E. User-level network interface protocols. Computer, 1998, 31(11): 53-60.

[9] Graham R L, Poole S, Shamis P et al. Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities. In Proc. IEEE International Symposium on Parallel & Distributed Processing, Work-shops and Phd Forum, April 2010.

[10] Kandalla K, Subramoni H, Vienne J et al. Designing nonblocking broadcast with collective offload on InfiniBand clusters: A case study with HPL. In Proc. the 19th IEEE Annual Symposium on High Performance Interconnects, Aug. 2011, pp.27-34.

[11] Buntinas D, Goglin B, Goodell D et al. Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis. In Proc. International Conference on Parallel Pro-cessing, Sept. 2009, pp.462-469.

[12] Lauria M, Pakin S, Chien A. Efficient layering for high speed communication: Fast messages 2.x. In Proc. the 7th Interna-tional Symposium on High Performance Distributed Com-puting, July 1998, pp.10-20.

[13] Liu J, Panda D K. Implementing efficient and scalable flow control schemes in MPI over InfiniBand. In Proc. the 18th International Parallel and Distributed Processing Sympo-sium, April 2004.

[14] Vetter J S, Mueller F. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. Journal of Parallel and Distributed Comput-ing, 2003, 63(9): 853-865.

[15] Tezuka H, O'Carroll F, Hori A et al. Pin-down cache: A virtual memory management technique for zero-copy communication. In Proc. Symposium on Parallel and Distributed Processing, Mar. 30-Apr. 3, 1998, pp.308-314.

[16] IBM Blue Gene team. The IBM Blue Gene project. IBM J. Res. Dev., 2013, 57(1/2): 0:1-0:6.

[17] Chen D, Eisley N A, Heidelberger P et al. The IBM Blue Gene/Q interconnection fabric. IEEE Micro, 2012, 32(1): 32-43.

[18] Ajima Y, Inoue T, Hiramota S et al. The Tofu interconnect. IEEE Micro, 2012, 32(1): 21-31.

[19] Alverson R, Roweth D, Kaplan L. The Gemini system interconnect. In Proc. the 18th IEEE Symposium on High Per-formance Interconnects, Aug. 2010, pp.83-87.

[20] Schroeder B, Gibson G. Understanding failures in petascale computers. J. Physics: Conference Series, 2007, 78: 012022.

[21] Graham R L, Poole S, Shamis P et al. ConnectX-2 Infini-Band management queues: First investigation of the new support for network offloaded collective operations. In Proc. the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, May 2010, pp.53-62.

[22] Subramoni H, Kandalla K, Sur S et al. Design and evaluation of generalized collective communication primitives with overlap using connectX-2 offload engine. In Proc. the 18th IEEE Annual Symposium on High Performance Intercon-nects, Aug. 2010, pp.40-49.

[23] Arimilli B, Arimilli R, Chung V et al. The PERCS highperformance interconnect. In Proc. the 18th IEEE Sympo-sium on High Performance Interconnects, Aug. 2010, pp.75-82.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved