We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Zheng Cao, Xiao-Li Liu, Qiang Li, Xiao-Bing Liu, Zhan Wang, Xue-Jun An. An Intra-Server Interconnect Fabric for Heterogeneous Computing[J]. Journal of Computer Science and Technology, 2014, 29(6): 976-988. DOI: 10.1007/s11390-014-1483-0
Citation: Zheng Cao, Xiao-Li Liu, Qiang Li, Xiao-Bing Liu, Zhan Wang, Xue-Jun An. An Intra-Server Interconnect Fabric for Heterogeneous Computing[J]. Journal of Computer Science and Technology, 2014, 29(6): 976-988. DOI: 10.1007/s11390-014-1483-0

An Intra-Server Interconnect Fabric for Heterogeneous Computing

Funds: This work was supported by the National Natural Science Foundation of China under Grant No. 61100014.
More Information
  • Author Bio:

    Zheng Cao received his Ph.D. degree in computer science from Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing, in 2009. He is an associate professor of ICT, CAS. His main research interests include high performance computer architecture and high performance interconnection networks. He is a member of CCF and ACM.

  • Received Date: December 15, 2013
  • Revised Date: July 03, 2014
  • Published Date: November 04, 2014
  • With the increasing diversity of application needs and computing units, the server with heterogeneous processors is more and more widespread. However, conventional SMP/ccNUMA server architecture introduces communication bottleneck between heterogeneous processors and only uses heterogeneous processors as coprocessors, which limits the efficiency and flexibility of using heterogeneous processors. To solve this problem, this paper proposes an intra-server interconnect fabric that supports both intra-server peer-to-peer interconnection and I/O resource sharing among heterogeneous processors. By connecting processors and I/O devices with the proposed fabric, heterogeneous processors can perform direct communication with each other and run in stand-alone mode with shared intra-server resources. We design the proposed fabric by extending the de-facto system I/O bus protocol PCIe (Peripheral Computer Interconnect Express) and implement it with a single chip cZodiac. By making full use of PCIe's original advantages, the interconnection and the I/O sharing mechanism are light weight and efficient. Evaluations that have been carried out on both the FPGA (Field Programmable Gate Array) prototype and the cycle-accurate simulator demonstrate that our design is feasible and scalable. In addition, our design is suitable for not only the heterogeneous server but also the high density server.
  • [1]
    Barker K J, Davis K, Hoisie A et al. Entering the petaflop era: The architecture and performance of Roadrunner. In Proc. ACM/IEEE Conf. Supercomputing, Nov. 2008, Article No. 1.
    [2]
    Sun N H, Xing J, Huo Z G et al. Dawning Nebulae: A petaFLOPS supercomputer with a heterogeneous structure. Journal of Computer Science and Technology, 2011, 26(3): 352-362.
    [3]
    Reddi V J, Lee B C, Chilimbi T, Vaid K. Web search using mobile cores: Quantifying and mitigating the price of effciency. In Proc. the 37th Annual Int. Symp. Computer Architecture, June 2010, pp.314-325.
    [4]
    Guevara M, Lubin B, Lee B C. Navigating heterogeneous processors with market mechanisms. In Proc. the 19th IEEE Int. Symp. High Performance Computer Architecture, Feb. 2013, pp.95-106.
    [5]
    Zapater M, Ayala J L, Moya J M. Leveraging heterogeneity for energy minimization in data centers. In Proc. the 12th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2012, pp.752-757.
    [6]
    Suneja S, Baron E, Lara E D et al. Accelerating the cloud with heterogeneous computing. In Proc. the 3rd USENIX Conf. Hot Topics in Cloud Computing, June 2011, p.23.
    [7]
    Peh L S, Dally W J. A delay model and speculative architecture for pipelined routers. In Proc. the 7th Int. Symp. High Performance Computer Architecture, Jan. 2001, pp.255-266.
    [8]
    Ohno Y, Nishibori E, Narumi T et al. A 281Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3. In Proc. ACM/IEEE Conference on Supercomputing, Nov. 2007, pp.1-10.
    [9]
    Wong D, Annavaram M. KnightShift: Scaling the energy proportionality wall through server-level heterogeneity. In Proc. the 45th IEEE/ACM Int. Symp. Microarchitecture, Dec. 2012, pp.119-130.
    [10]
    Krishnan V. Evaluation of an integrated PCI express IO expansion and clustering fabric. In Proc. the 16th IEEE Symp. High Performance Interconnects, Aug. 2008, pp.93-100.
    [11]
    Krishnan V. Towards an integrated IO and clustering solution using PCI express. In Proc. IEEE International Conference on Cluster Computing, Sept. 2007, pp.259-266.
    [12]
    Aswadhati A. Scaling data center services with PCI express. In Proc. Linley Tech. Data Center Conference, Feb. 2012.
    [13]
    Suzuki J, Hidaka Y, Higuchi J et al. Multi-root share of single-root I/O virtualization (SR-IOV) compliant PCI Express device. In Proc. the 18th IEEE Symp. High Performance Interconnects, Aug. 2010. pp.25-31
  • Related Articles

    [1]Yuan Li, Jie Dai, Xiao-Lin Fan, Yu-Hai Zhao, Guo-Ren Wang. I/O Efficient Early Bursting Cohesive Subgraph Discovery in Massive Temporal Networks[J]. Journal of Computer Science and Technology, 2022, 37(6): 1337-1355. DOI: 10.1007/s11390-022-2367-3
    [2]Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, Richard Warren. ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems[J]. Journal of Computer Science and Technology, 2020, 35(1): 145-160. DOI: 10.1007/s11390-020-9822-9
    [3]Anthony Kougkas, Hariharan Devarajan, Xian-He Sun. I/O Acceleration via Multi-Tiered Data Buffering and Prefetching[J]. Journal of Computer Science and Technology, 2020, 35(1): 92-120. DOI: 10.1007/s11390-020-9781-1
    [4]Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance[J]. Journal of Computer Science and Technology, 2020, 35(1): 47-60. DOI: 10.1007/s11390-020-9798-5
    [5]Fang Lv, Hui-Min Cui, Lei Wang, Lei Liu, Cheng-Gang Wu, Xiao-Bing Feng, Pen-Chung Yew. Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms[J]. Journal of Computer Science and Technology, 2014, 29(1): 21-37. DOI: 10.1007/s11390-013-1409-2
    [6]Yong-Zhao Zhan, Ke-Yang Cheng, Ya-Bi Chen, Chuan-Jun Wen. A New Classifier for Facial Expression Recognition: Fuzzy Buried Markov Model[J]. Journal of Computer Science and Technology, 2010, 25(3): 641-650.
    [7]Min Zhao, Su-Qing Han, Jue Wang. Tree Expressions for Information Systems[J]. Journal of Computer Science and Technology, 2007, 22(2): 297-307.
    [8]Dan Feng, Hong Jiang, Yi-Feng Zhu. I/O Performance of an RAID-10 Style Parallel File System[J]. Journal of Computer Science and Technology, 2004, 19(6).
    [9]Chao Yan, Guo-Liang Chen, Yi-Fei Shen. Outlier Analysis for Gene Expression Data[J]. Journal of Computer Science and Technology, 2004, 19(1).
    [10]SUN Ninghui. Reference Implementation of Scalable I/O Low-Level API on Intel Paragon[J]. Journal of Computer Science and Technology, 1999, 14(3): 206-223.

Catalog

    Article views (24) PDF downloads (1581) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return