›› 2015,Vol. 30 ›› Issue (1): 97-109.doi: 10.1007/s11390-015-1507-4

所属专题: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

基于数据中心互连的新型内存架构

Tao Jiang1,2(江涛), Member, CCF, ACM, IEEE, Rui Hou1(侯锐), Member, CCF, ACM, IEEE, Jian-Bo Dong1(董建波), Member, CCF, ACM, IEEE, Lin Chai1,2(柴琳), Sally A. McKee3, Member, ACM, IEEE, Bin Tian4(田斌), Member, CCF, Li-Xin Zhang1(张立新), Member, ACM, IEEE, Ning-Hui Sun1(孙凝晖), Fellow, CCF, Member, ACM, IEEE   

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Computer Science and Engineering, Chalmers University of Technology, Gothenburg 41296, Sweden;
    4 National High Performance Integrated Circuit Design Center (Shanghai), Shanghai 201204, China
  • 收稿日期:2014-07-14 修回日期:2014-12-15 出版日期:2015-01-05 发布日期:2015-01-05
  • 作者简介:Tao Jiang received his M.S. degree in computer architecture from Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing, in 2007. He is an assistant professor of ICT, CAS. His main research interests include computer architecture and operating system. He is a member of CCF, ACM, and IEEE.
  • 基金资助:

    This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010401, and the National Natural Science Foundation of China under Grant Nos. 61100010, 61402438, and 61402439.

Adapting Memory Hierarchies for Emerging Datacenter Interconnects

Tao Jiang1,2(江涛), Member, CCF, ACM, IEEE, Rui Hou1(侯锐), Member, CCF, ACM, IEEE, Jian-Bo Dong1(董建波), Member, CCF, ACM, IEEE, Lin Chai1,2(柴琳), Sally A. McKee3, Member, ACM, IEEE, Bin Tian4(田斌), Member, CCF, Li-Xin Zhang1(张立新), Member, ACM, IEEE, Ning-Hui Sun1(孙凝晖), Fellow, CCF, Member, ACM, IEEE   

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 Computer Science and Engineering, Chalmers University of Technology, Gothenburg 41296, Sweden;
    4 National High Performance Integrated Circuit Design Center (Shanghai), Shanghai 201204, China
  • Received:2014-07-14 Revised:2014-12-15 Online:2015-01-05 Published:2015-01-05
  • About author:Tao Jiang received his M.S. degree in computer architecture from Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing, in 2007. He is an assistant professor of ICT, CAS. His main research interests include computer architecture and operating system. He is a member of CCF, ACM, and IEEE.
  • Supported by:

    This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010401, and the National Natural Science Foundation of China under Grant Nos. 61100010, 61402438, and 61402439.

为了提高节点间资源的利用率,需要新型数据中心互连网络提供对高性能通信和远程资源共享的支持,同时也需要网络与CPU芯片之间的耦合更加紧密.因此,在设计新型互连网络技术时,不仅需要考虑网络本身,还要考虑相应的处理器的设计.本文将研究内存系统层次对数据中心高速互连网络设计带来的影响,特别是对远程内存访问性能的影响.我们实现了三个互为补充的评估平台,包括:一个PCIe互连的服务器原型,我们使用该平台分析和评估当前技术存在的瓶颈;一个软件模拟器,我们使用该模拟器模拟微体系结构和cache层次的优化;一个FPGA原型系统,该原型系统包括完全流水的、无交换的自定义高速通信协议Thunder,我们使用该平台研究处理器核外的硬件优化.本文提出了几种体系结构的优化方法,用以更好的支持远程内存访问和节点间通信,并且通过实验量化了它们所带来的性能影响和限制.

Abstract: Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly coupled with the CPU chips. Designing a new interconnection technology thus requires considering not only the interconnection itself, but also the design of the processors that will rely on it. In this paper, we study memory hierarchy implications for the design of high-speed datacenter interconnects — particularly as they affect remote memory access — and we use PCIe as the vehicle for our investigations. To that end, we build three complementary platforms: a PCIe-interconnected prototype server with which we measure and analyze current bottlenecks; a software simulator that lets us model microarchitectural and cache hierarchy changes; and an FPGA prototype system with a streamlined switchless customized protocol Thunder with which we study hardware optimizations outside the processor. We highlight several architectural modifications to better support remote memory access and communication, and quantify their impact and limitations.

[1] Benson T, Akella A, Maltz D. Network traffic characteristics of data centers in the wild. In Proc. the 10th ACM SIGCOMM Conf. Internet Measurement, Nov. 2010, pp.267-280.

[2] Regula J. Integrating rack level connectivity into a PCI Express switch. In Proc. Hot Chips: A Symposium on High Performance Chips, Aug. 2013, pp.259-266.

[3] Pfister G. An introduction to the InfiniBandTM architecture. In High Performance Mass Storage and Parallel I/O, Cortes T, Jin H, Buyya R (eds.), John Wiley & Sons, 2001, pp.617-632.

[4] Hou R, Jiang T, Zhang L, Qi P, Dong J, Wang H, Gu X, Zhang S. Cost effective data center servers. In Proc. the 19th IEEE Int. Symp. High Performance Computer Architecture, Feb. 2013, pp.179-187.

[5] Léon E, Riesen R, Ferreira K, Maccabe A. Cache injection for parallel applications. In Proc. the 20th ACM Int. Symp. High Performance Distributed Computing, Jun. 2011, pp.15-26.

[6] Brown J, Woodward S, Bass B, Johnson C. IBM power edge of network processor: A wire-speed system on a chip. IEEE Micro, 2011, 31(2): 76-85.

[7] Binkert N, Beckmann B, Black G et al. The gem5 simulator. ACM SIGARCH Comput. Archit. News, 2011, 39(2): 1-7.

[8] Hurwitz J, Feng W. End-to-end performance of 10-Gigabit Ethernet on commodity systems. IEEE Micro, 2004, 24(1): 10-12.

[9] Deshpande U, Wang B, Haque S, Hines M, Gopalan K. MemX: Virtualization of cluster-wide memory. In Proc. the 39th International Conference on Parallel Processing, Sept. 2010, pp.663-672.

[10] Lim K, Chang J, Mudge T, Ranganathan P, Reinhardt S, Wenisch T. Disaggregated memory for expansion and sharing in blade servers. In Proc. the 36th International Symposium on Computer Architecture, Jun. 2009, pp. 267-278.

[11] Novakovic S, Daglis A, Bugnion E, Falsafi B, Grot B. Scaleout NUMA. In Proc. the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Feb. 2014, pp.3-18.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[2] 范植华;. Vectorization for Loops with Three-Forked Jumps[J]. , 1988, 3(3): 186 -202 .
[3] 李锦涛; 闵应骅;. Product-Oriented Test-Pattern Generation for Programmable Logic Arrays[J]. , 1990, 5(2): 164 -174 .
[4] 郭庆平; Y.Paker;. Communication Analysis and Granularity Assessment for a Transputer-Based System[J]. , 1990, 5(4): 347 -362 .
[5] 周勇; 唐泽圣;. Constructing Isosurfaces from 3D Data Sets Taking Account of Depth Sorting of Polyhedra[J]. , 1994, 9(2): 117 -127 .
[6] 廖乐健; 史忠植;. Minimal Model Semantics for Sorted Constraint Representation[J]. , 1995, 10(5): 439 -446 .
[7] 赵彧; 张琼; 向辉; 石教英; 何志均;. A Simplified Model for Generating 3D Realistic Sound in the Multimedia and Virtual Reality Systems[J]. , 1996, 11(4): 461 -470 .
[8] 汪芸; 顾冠群; 兑继英;. Research on Protocol Migration[J]. , 1996, 11(6): 601 -606 .
[9] 程歧; 朱洪;. MNP: A Class of NP Optimization Problems[J]. , 1997, 12(4): 306 -313 .
[10] 傅育熙;. Constructive Sets in Computable Sets[J]. , 1997, 12(5): 425 -440 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: