OpenMDSP：Extending OpenMP to Program Multi-Core DSPs

Jiang-Zhou He; Wen-Guang Chen; Guang-Ri Chen; Wei-Min Zheng; Zhi-Zhong Tang; Han-Dong Ye

doi:10.1007/s11390-014-1433-x

Jiang-Zhou He, Wen-Guang Chen, Guang-Ri Chen, Wei-Min Zheng, Zhi-Zhong Tang, Han-Dong Ye. OpenMDSP：Extending OpenMP to Program Multi-Core DSPsJ. Journal of Computer Science and Technology, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x

Citation:

OpenMDSP：Extending OpenMP to Program Multi-Core DSPs

Abstract

Abstract

Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, effcient multi-core DSP applications are very diffcult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on FreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.

FullText(HTML)

References (24)

Relative Articles

Supplements (0)

Cited By

OpenMDSP：Extending OpenMP to Program Multi-Core DSPs

Abstract

Catalog

Export File

Citation

Format

Content