We use cookies to improve your experience with our site.
江涛, 侯锐, 董建波, 柴琳, 田斌, 张立新, 孙凝晖. 基于数据中心互连的新型内存架构[J]. 计算机科学技术学报, 2015, 30(1): 97-109. DOI: 10.1007/s11390-015-1507-4
引用本文: 江涛, 侯锐, 董建波, 柴琳, 田斌, 张立新, 孙凝晖. 基于数据中心互连的新型内存架构[J]. 计算机科学技术学报, 2015, 30(1): 97-109. DOI: 10.1007/s11390-015-1507-4
Tao Jiang, Rui Hou, Jian-Bo Dong, Lin Chai, Sally A. McKee, Bin Tian, Li-Xin Zhang, Ning-Hui Sun. Adapting Memory Hierarchies for Emerging Datacenter Interconnects[J]. Journal of Computer Science and Technology, 2015, 30(1): 97-109. DOI: 10.1007/s11390-015-1507-4
Citation: Tao Jiang, Rui Hou, Jian-Bo Dong, Lin Chai, Sally A. McKee, Bin Tian, Li-Xin Zhang, Ning-Hui Sun. Adapting Memory Hierarchies for Emerging Datacenter Interconnects[J]. Journal of Computer Science and Technology, 2015, 30(1): 97-109. DOI: 10.1007/s11390-015-1507-4

基于数据中心互连的新型内存架构

Adapting Memory Hierarchies for Emerging Datacenter Interconnects

  • 摘要: 为了提高节点间资源的利用率,需要新型数据中心互连网络提供对高性能通信和远程资源共享的支持,同时也需要网络与CPU芯片之间的耦合更加紧密.因此,在设计新型互连网络技术时,不仅需要考虑网络本身,还要考虑相应的处理器的设计.本文将研究内存系统层次对数据中心高速互连网络设计带来的影响,特别是对远程内存访问性能的影响.我们实现了三个互为补充的评估平台,包括:一个PCIe互连的服务器原型,我们使用该平台分析和评估当前技术存在的瓶颈;一个软件模拟器,我们使用该模拟器模拟微体系结构和cache层次的优化;一个FPGA原型系统,该原型系统包括完全流水的、无交换的自定义高速通信协议Thunder,我们使用该平台研究处理器核外的硬件优化.本文提出了几种体系结构的优化方法,用以更好的支持远程内存访问和节点间通信,并且通过实验量化了它们所带来的性能影响和限制.

     

    Abstract: Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly coupled with the CPU chips. Designing a new interconnection technology thus requires considering not only the interconnection itself, but also the design of the processors that will rely on it. In this paper, we study memory hierarchy implications for the design of high-speed datacenter interconnects — particularly as they affect remote memory access — and we use PCIe as the vehicle for our investigations. To that end, we build three complementary platforms: a PCIe-interconnected prototype server with which we measure and analyze current bottlenecks; a software simulator that lets us model microarchitectural and cache hierarchy changes; and an FPGA prototype system with a streamlined switchless customized protocol Thunder with which we study hardware optimizations outside the processor. We highlight several architectural modifications to better support remote memory access and communication, and quantify their impact and limitations.

     

/

返回文章
返回