›› 2014, Vol. 29 ›› Issue (2): 281-292.doi: 10.1007/s11390-014-1430-0

Special Issue: Computer Architecture and Systems; Computer Networks and Distributed Computing

• Special Section on Cloud-Sea Computing Systems • Previous Articles     Next Articles

A High-Performance and Cost-Effcient Interconnection Network for High-Density Servers

Wen-Tao Bao1, 2 (包雯韬), Student Member, CCF, ACM, IEEE Bin-Zhang Fu1 (付斌章), Member, CCF, ACM, IEEE, Ming-Yu Chen1, 2 (陈明宇), Member, CCF, ACM, IEEE and Li-Xin Zhang1, 2 (张立新), Member, ACM, IEEE   

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2013-11-14 Revised:2014-01-09 Online:2014-03-05 Published:2014-03-05
  • About author:Wen-Tao Bao received the B.S. degree from Jilin University, Changchun, in 2012. Now she is pursuing her M.S. degree in Institute of Computing Technology, Chinese Academy of Sciences, Beijing. Her research interests include highperformance and high-reliable interconnection networks.
  • Supported by:

    This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010401, the National Natural Science Foundation of China under Grant Nos. 61202056, 61331008, 61221062, and the Huawei Research Program of China under Grant No. YBCB2011030.

The high-density server is featured as low power, low volume, and high computational density. With the rising use of high-density servers in data-intensive and large-scale web applications, it requires a high-performance and cost-effcient intra-server interconnection network. Most of state-of-the-art high-density servers adopt the fully-connected intra-server network to attain high network performance. Unfortunately, this solution costs too much due to the high degree of nodes. In this paper, we exploit the theoretically optimized Moore graph to interconnect the chips within a server. Accounting for the suitable size of applications, a 50-size Moore graph, called Hoffman-Singleton graph, is adopted. In practice, multiple chips should be integrated onto one processor board, which means that the original graph should be partitioned into homogeneous connected subgraphs. However, the existing partition scheme does not consider above problem and thus generates heterogeneous subgraphs. To address this problem, we propose two equivalent-partition schemes for the Hoffman-Singleton graph. In addition, a logic-based and minimal routing mechanism, which is both time and area effcient, is proposed. Finally, we compare the proposed network architecture with its counterparts, namely the fully-connected, Kautz and Torus networks. The results show that our proposed network can achieve competitive performance as fully-connected network and cost close to Torus.

[1] Montero R S, Huedo E, Llorente I M. Benchmarking of high throughput computing applications on grids. Parallel Com-puting, 2006, 32(4): 267-279.

[2] Faanes G, Bataineh A, Roweth D, Court T, Froese E, Alver-son B, Johnson T, Kopnick J, Higgins M, Reinhard J. Cray cascade: A scalable HPC system based on a Dragonfly net-work. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis (SC2012), November 2012, Article No.103.

[3] Rao A. SeaMicro technology overview. Technical Report, AMD, January 2012. http://www.seamicro.com/sites/defau-lt/files/SM TO01 64 v2.5.pdf, December 2013.

[4] Rajamony R, Stephenson M C, Speight W E. The power 775 architecture at scale. In Proc. the 27th International ACM Conference on International Conference on Supercomputing (ICS2013), June 2013, pp.183-192.

[5] Rao A. SeaMicro SM10000 system overview. Technical Re-port, AMD, June 2010. http://www.tiger-optics.ru/ down-load/seamicro/SM TO02 v1.4.pdf, December 2013.

[6] Hoffman A J, Singleton R R. On Moore graphs with diame-ters 2 and 3. IBM J. Research and Development, 1960, 4(5): 497-504.

[7] Mattson T G, Van der Wijngaart R, Frumkin M. Program-ming the Intel 80-core network-on-a-chip terascale processor. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis (SC2008), Nov. 2008, Article No.38.

[8] Bell S, Edwards B, Amann J et al. TILE64-processor: A 64-core SoC with mesh interconnect. In Proc. Interna-tional Solid-State Circuits Conference (ISSCC2008), Febru-ary 2008, pp.88-89.

[9] Seo J, Lee H, Jang M. Optimal routing and Hamiltonian cycle in Petersen-Torus networks. In Proc. the 3rd International Conference on Convergence and Hybrid Information Tech-nology (ICCIT2008), November 2008, pp.303-308.

[10] Barroso L A, Dean J, Hölzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003, 23(2): 22-28.

[11] O'Malley O. TeraByte sort on Apache Hadoop. Technical Re-port, Yahoo!, May 2008. http://sortbenchmark.org/Yahoo-Hadoop.pdf, December 2013.

[12] Esteves R M, Pais R, Rong C. K-Means clustering in the cloud { A Mahout test. In Proc. the 2011 IEEE Workshops of In-ternational Conference on Advanced Information Networking and Applications (WAINA2011), March 2011, pp.514-519.

[13] Thusoo A, Sarma J, Jain N et al. Hive: A warehousing solu-tion over a map-reduce framework. In Proc. the 35th Inter-national Conference on Very Large Data Bases (VLDB2009), August 2009, pp.1626-1629.

[14] Adiga N R, Blumrich M A, Chen D et al. Blue Gene/L torus interconnection network. IBM Journal of Research and De-velopment, 2005, 49(2): 265-276.

[15] Nan J, Becker D U, Michelogiannakis G et al. A detailed and flexible cycle-accurate Network-on-Chip simulator. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS2013), April 2013, pp.86-96.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved