We use cookies to improve your experience with our site.
郑方, 李宏亮, 吕晖, 过锋, 许晓红, 谢向辉. 面向深度融合异构众核处理器的协同计算技术[J]. 计算机科学技术学报, 2015, 30(1): 145-162. DOI: 10.1007/s11390-015-1510-9
引用本文: 郑方, 李宏亮, 吕晖, 过锋, 许晓红, 谢向辉. 面向深度融合异构众核处理器的协同计算技术[J]. 计算机科学技术学报, 2015, 30(1): 145-162. DOI: 10.1007/s11390-015-1510-9
Fang Zheng, Hong-Liang Li, Hui Lv, Feng Guo, Xiao-Hong Xu, Xiang-Hui Xie. Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture[J]. Journal of Computer Science and Technology, 2015, 30(1): 145-162. DOI: 10.1007/s11390-015-1510-9
Citation: Fang Zheng, Hong-Liang Li, Hui Lv, Feng Guo, Xiao-Hong Xu, Xiang-Hui Xie. Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture[J]. Journal of Computer Science and Technology, 2015, 30(1): 145-162. DOI: 10.1007/s11390-015-1510-9

面向深度融合异构众核处理器的协同计算技术

Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture

  • 摘要: 随着半导体技术的进步,众核处理器已经广泛应用于高性能计算领域.但是由于"访存墙"的影响,很多应用在众核处理器上执行的过程中出现了性能瓶颈,处理器的计算的能力难以有效发挥.本文中,我们提出了一种面向高性能计算领域的新的深度融合异构众核处理器结构(DFMC,deeply fused many-core).DFMC片上集成了异构的管理核心(MPE)和计算核心(CPE),这两种核心面向不同的应用特征,但使用统一的指令集、统一的执行模型,并支持带Cache一致性的主存共享.为减轻"访存墙"影响,DFMC的CPE之间支持多种协同计算技术,包括多模式数据流传输、高效的寄存器通信技术和快速硬件同步技术.这些技术可以提高片上片上数据重用率并优化访存性能.本文实现了一个基于FPGA的全片原型系统,包括了4个管理核心和256个计算核心.实验结果表明,协同计算技术可以有效提高协同计算效率,DGEMM的效率达到94%,FFT性能达到207Gflops,FDTD性能为27Gflops.

     

    Abstract: Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing elements (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.

     

/

返回文章
返回