Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture

Fang Zheng; Hong-Liang Li; Hui Lv; Feng Guo; Xiao-Hong Xu; Xiang-Hui Xie

doi:10.1007/s11390-015-1510-9

Fang Zheng, Hong-Liang Li, Hui Lv, Feng Guo, Xiao-Hong Xu, Xiang-Hui Xie. Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor ArchitectureJ. Journal of Computer Science and Technology, 2015, 30(1): 145-162. DOI: 10.1007/s11390-015-1510-9

Citation:

Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture

Abstract

Abstract

Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing elements (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.

FullText(HTML)

References (53)

Relative Articles

Supplements (0)

Cited By

Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture

Abstract

Catalog

Export File

Citation

Format

Content