We use cookies to improve your experience with our site.
何江舟, 陈文光, 陈光日, 郑纬民, 汤志忠, 叶寒栋. OpenMDSP:扩展OpenMP为多核信号处理器编程[J]. 计算机科学技术学报, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x
引用本文: 何江舟, 陈文光, 陈光日, 郑纬民, 汤志忠, 叶寒栋. OpenMDSP:扩展OpenMP为多核信号处理器编程[J]. 计算机科学技术学报, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x
Jiang-Zhou He, Wen-Guang Chen, Guang-Ri Chen, Wei-Min Zheng, Zhi-Zhong Tang, Han-Dong Ye. OpenMDSP:Extending OpenMP to Program Multi-Core DSPs[J]. Journal of Computer Science and Technology, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x
Citation: Jiang-Zhou He, Wen-Guang Chen, Guang-Ri Chen, Wei-Min Zheng, Zhi-Zhong Tang, Han-Dong Ye. OpenMDSP:Extending OpenMP to Program Multi-Core DSPs[J]. Journal of Computer Science and Technology, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x

OpenMDSP:扩展OpenMP为多核信号处理器编程

OpenMDSP:Extending OpenMP to Program Multi-Core DSPs

  • 摘要: 多核信号处理器(DSP)在无线通讯、核心网络转码、工业控制和音频视频处理等领域具有广泛的应用。与通用多处理器相比,多核DSP通常具有更加复杂的内存结构,例如它们常常具有片上的核芯私有内存和无高速缓存一致性的共享内存。这样,编写高效的多核DSP程序是很困难的。目前为多核DSP编程的方法是基于DSP厂商所提供的软件开发工具包(SDK)的,这些SDK只提供一些底层的不可移植的原语。这种方法对于编写粗粒度的任务级并行代码尚可,但如果用它们编写细粒度的数据并行代码就非常繁冗和容易出错了。一种高层的可移植的适合多核DSP的并行编程模型是很有价值的。本文提出了OpenMDSP——一种为多核DSP设计的OpenMP扩展。OpenMDSP的目标是弥合OpenMP内存模型和多核DSP的内存层次结构之间的缝隙。本文提出了三类OpenMDSP制导语句,包括:(1) 数据放置制导语句,它为程序员提供了一种方便控制全局变量放置位置的途径;(2) 分布式数组制导语句,它将一个大数组分段并将每段分别提升到各个核芯的私有内存中,从而提高性能;(3) 流式访问制导语句,它在一个并行循环执行的过程中将大数组逐段提升到核芯私有内存中,同时利用DSP的直接内存访问(DMA)来隐藏数据移动所造成的延迟。我们面向FreeScale MSC8156实现了OpenMDSP的编译器和运行时环境。实验结果表明在使用六个线程的情况下,九个基准测试程序之中有七个能够达到超过5的加速比。

     

    Abstract: Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, effcient multi-core DSP applications are very diffcult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on FreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.

     

/

返回文章
返回