|
›› 2015,Vol. 30 ›› Issue (1): 97-109.doi: 10.1007/s11390-015-1507-4
所属专题: Computer Architecture and Systems
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Tao Jiang1,2(江涛), Member, CCF, ACM, IEEE, Rui Hou1(侯锐), Member, CCF, ACM, IEEE, Jian-Bo Dong1(董建波), Member, CCF, ACM, IEEE, Lin Chai1,2(柴琳), Sally A. McKee3, Member, ACM, IEEE, Bin Tian4(田斌), Member, CCF, Li-Xin Zhang1(张立新), Member, ACM, IEEE, Ning-Hui Sun1(孙凝晖), Fellow, CCF, Member, ACM, IEEE
Tao Jiang1,2(江涛), Member, CCF, ACM, IEEE, Rui Hou1(侯锐), Member, CCF, ACM, IEEE, Jian-Bo Dong1(董建波), Member, CCF, ACM, IEEE, Lin Chai1,2(柴琳), Sally A. McKee3, Member, ACM, IEEE, Bin Tian4(田斌), Member, CCF, Li-Xin Zhang1(张立新), Member, ACM, IEEE, Ning-Hui Sun1(孙凝晖), Fellow, CCF, Member, ACM, IEEE
为了提高节点间资源的利用率,需要新型数据中心互连网络提供对高性能通信和远程资源共享的支持,同时也需要网络与CPU芯片之间的耦合更加紧密.因此,在设计新型互连网络技术时,不仅需要考虑网络本身,还要考虑相应的处理器的设计.本文将研究内存系统层次对数据中心高速互连网络设计带来的影响,特别是对远程内存访问性能的影响.我们实现了三个互为补充的评估平台,包括:一个PCIe互连的服务器原型,我们使用该平台分析和评估当前技术存在的瓶颈;一个软件模拟器,我们使用该模拟器模拟微体系结构和cache层次的优化;一个FPGA原型系统,该原型系统包括完全流水的、无交换的自定义高速通信协议Thunder,我们使用该平台研究处理器核外的硬件优化.本文提出了几种体系结构的优化方法,用以更好的支持远程内存访问和节点间通信,并且通过实验量化了它们所带来的性能影响和限制.
[1] Benson T, Akella A, Maltz D. Network traffic characteristics of data centers in the wild. In Proc. the 10th ACM SIGCOMM Conf. Internet Measurement, Nov. 2010, pp.267-280.[2] Regula J. Integrating rack level connectivity into a PCI Express switch. In Proc. Hot Chips: A Symposium on High Performance Chips, Aug. 2013, pp.259-266.[3] Pfister G. An introduction to the InfiniBandTM architecture. In High Performance Mass Storage and Parallel I/O, Cortes T, Jin H, Buyya R (eds.), John Wiley & Sons, 2001, pp.617-632.[4] Hou R, Jiang T, Zhang L, Qi P, Dong J, Wang H, Gu X, Zhang S. Cost effective data center servers. In Proc. the 19th IEEE Int. Symp. High Performance Computer Architecture, Feb. 2013, pp.179-187.[5] Léon E, Riesen R, Ferreira K, Maccabe A. Cache injection for parallel applications. In Proc. the 20th ACM Int. Symp. High Performance Distributed Computing, Jun. 2011, pp.15-26.[6] Brown J, Woodward S, Bass B, Johnson C. IBM power edge of network processor: A wire-speed system on a chip. IEEE Micro, 2011, 31(2): 76-85.[7] Binkert N, Beckmann B, Black G et al. The gem5 simulator. ACM SIGARCH Comput. Archit. News, 2011, 39(2): 1-7.[8] Hurwitz J, Feng W. End-to-end performance of 10-Gigabit Ethernet on commodity systems. IEEE Micro, 2004, 24(1): 10-12.[9] Deshpande U, Wang B, Haque S, Hines M, Gopalan K. MemX: Virtualization of cluster-wide memory. In Proc. the 39th International Conference on Parallel Processing, Sept. 2010, pp.663-672.[10] Lim K, Chang J, Mudge T, Ranganathan P, Reinhardt S, Wenisch T. Disaggregated memory for expansion and sharing in blade servers. In Proc. the 36th International Symposium on Computer Architecture, Jun. 2009, pp. 267-278.[11] Novakovic S, Daglis A, Bugnion E, Falsafi B, Grot B. Scaleout NUMA. In Proc. the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Feb. 2014, pp.3-18. |
No related articles found! |
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |