We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Jiang-Zhou He, Wen-Guang Chen, Guang-Ri Chen, Wei-Min Zheng, Zhi-Zhong Tang, Han-Dong Ye. OpenMDSP:Extending OpenMP to Program Multi-Core DSPs[J]. Journal of Computer Science and Technology, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x
Citation: Jiang-Zhou He, Wen-Guang Chen, Guang-Ri Chen, Wei-Min Zheng, Zhi-Zhong Tang, Han-Dong Ye. OpenMDSP:Extending OpenMP to Program Multi-Core DSPs[J]. Journal of Computer Science and Technology, 2014, 29(2): 316-331. DOI: 10.1007/s11390-014-1433-x

OpenMDSP:Extending OpenMP to Program Multi-Core DSPs

Funds: This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901 and the National Natural Science Foundation of China under Grant No. 61103021.
More Information
  • Author Bio:

    Jiang-Zhou He received the B.S. degree in computer science from Tsinghua University in 2007. Now he is a Ph.D. candidate in computer science and technology in Tsinghua University. His research interests include parallel and distributed computing, compiler technology, and programming models. He is a student member of CCF.

  • Received Date: March 25, 2013
  • Revised Date: November 28, 2013
  • Published Date: March 04, 2014
  • Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, effcient multi-core DSP applications are very diffcult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on FreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.
  • [1]
    Karam L, AlKamal I, Gatherer A, Frantz G, Anderson D, Evans B. Trends in multicore DSP platforms. Signal Process-ing Magazine, IEEE, 2009, 26(6): 38-49.
    [2]
    Zyren J. Overview of the 3GPP long term evolution physical layer, 2007. http://www.freescale.com/files/wireless comm/ doc/white paper/3GPPEVOLUTIONWP.pdf, Nov. 2013.
    [3]
    Reid A D, Flautner K, Grimley-Evans E, Lin Y. SoC-C: Ef-ficient programming abstractions for heterogeneous multicore systems on chip. In Proc. the 2008 CASES, October 2008, pp.95-104.
    [4]
    Thies W, Karczmarek M, Amarasinghe S. StreamIt: A lan-guage for streaming applications. In Proc. Int. Conf. Com-piler Construction, April 2002, pp.179-196.
    [5]
    Liao C, Hernandez O, Chapman B, Chen W, Zheng W. OpenUH: An optimizing, portable OpenMP compiler: Re-search Articles. Concurrency and Computation: Practice & Experience, 2007, 19(18): 2317-2332.
    [6]
    Dave C, Bae H, Min S, Lee S, Eigenmann R, Midkiff S. Ce-tus: A source-to-source compiler infrastructure for multicores. Computer, 2009, 42(11): 36-42.
    [7]
    Parr T, Quong R. ANTLR: A predicated-LL(k) parser gene-rator. Software { Practice & Experience, 1995, 25(7): 789-810.
    [8]
    Tian X, Girkar M, Shah S et al. Compiler and runtime support for running OpenMP programs on Pentium-and Itanium-architectures. In Proc. the 17th Parallel and Dis-tributed Processing Symposium, April 2003, pp.9-18.
    [9]
    Müller M S. Some simple OpenMP optimization techniques. In Lecture Notes in Computer Science 2104, Eigenmann R, Voss M, (eds.), Springer, 2001, pp.31-39.
    [10]
    Tian X, Girkar M, Bik A, Saito H. Practical compiler tech-niques on effcient multithreaded code generation for OpenMP programs. Computer Journal, 2005, 48(5): 588-601.
    [11]
    Chapman B M, Huang L. Enhancing OpenMP and its im-plementation for programming multicore systems. In Proc. Parallel Computing: Architectures, Algorithms and Applica-tions, September 2007, pp.3-18.
    [12]
    O'Brien K, O'Brien K M, Sura Z et al. Supporting OpenMP on cell. Int. J. Parallel Programming, 2008, 36(3): 289-311.
    [13]
    Wei H, Yu J. Loading OpenMP to Cell: An effective compiler framework for heterogeneous multi-core chip. In Proc. the 3rd International Workshop on OpenMP, June 2007, pp.129-133.
    [14]
    Lee S, Min S, Eigenmann R. OpenMP to GPGPU: A com-piler framework for automatic translation and optimization. In Proc. the 14th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Feb. 2009, pp.101-110.
    [15]
    Lee S, Eigenmann R. OpenMPC: Extended OpenMP pro-gramming and tuning for GPUs. In Proc. the 2010 Conf. High Performance Computing Networking, Storage and Anal-ysis, Nov. 2010.
    [16]
    Liu F, Chaudhary V. Extending OpenMP for heterogeneous chip multiprocessors. In Proc. the 32nd International Con-ference on Parallel Processing, October 2003, pp.161-168.
    [17]
    Liu F, V. Chaudhary. A practical OpenMP compiler for sys-tem on chips. In Lecture Notes in Computer Science 2716, Voss M (ed.), Springer, 2003, pp.54-68.
    [18]
    Kimura K, Mase M, Mikami H et al. OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers. In Lecture Notes in Computer Science 5898, Gao G, Pollock L, Cavazos J, Li X (eds.), Springer, 2009, pp.188-202.
    [19]
    Hayashi A,Wada Y,Watanabe T et al. Parallelizing compiler framework and API for power reduction and software produc-tivity of real-time heterogeneous multicores. In Lecture Notes in Computer Science 6548, Cooper K, Mellor-Crummey J, Sarkar V (eds.), Springer, 2010, pp.184-198.
    [20]
    Leupers R, Castrillón J. MPSoC programming using the MAPS compiler. In Proc. the 15th Asia and South Pacific Design Automation Conference, January 2010, pp.897-902.
    [21]
    Kwon S, Kim Y, Jeun W, Ha S, Paek Y. A retargetable parallel-programming framework for MPSoC. ACM Trans. Design Autom. Electr. Syst., 2008, 13(3): Article No.39.
    [22]
    Kennedy K, Koelbel C, Zima H P. The rise and fall of High Performance Fortran: An historical object lesson. In Proc. the 3rd ACM SIGPLAN Conf. History of Programming Lan-guages, June 2007, Article No. 7.
    [23]
    El-Ghazawi T, Carlson W, Sterling T et al. UPC: Distributed Shared Memory Programming. Wiley-Interscience, 2003.
    [24]
    Numrich R W, Reid J. Co-array Fortran for parallel program-ming. ACM Fortran Forum, 1998, 17(2): 1-31.
  • Related Articles

    [1]Jiang Rong, Tao Qin, Bo An. Competitive Cloud Pricing for Long-Term Revenue Maximization[J]. Journal of Computer Science and Technology, 2019, 34(3): 645-656. DOI: 10.1007/s11390-019-1933-9
    [2]Zuo-Ning Chen, Kang Chen, Jin-Lei Jiang, Lu-Fei Zhang, Song Wu, Zheng-Wei Qi, Chun-Ming Hu, Yong-Wei Wu, Yu-Zhong Sun, Hong Tang, Ao-Bing Sun, Zi-Lu Kang. Evolution of Cloud Operating System: From Technology to Ecosystem[J]. Journal of Computer Science and Technology, 2017, 32(2): 224-241. DOI: 10.1007/s11390-017-1717-z
    [3]Xiao-Feng Tao, Yan-Zhao Hou, Kai-Dong Wang, Hai-Yang He, Y. Jay Guo. GPP-Based Soft Base Station Designing and Optimization[J]. Journal of Computer Science and Technology, 2013, 28(3): 420-428. DOI: 10.1007/s11390-013-1343-3
    [4]Lei Zhao, Ji-Wen Yang. Resources Snapshot Model for Concurrent Transactions in Multi-Core Processors[J]. Journal of Computer Science and Technology, 2013, 28(1): 106-118. DOI: 10.1007/s11390-013-1315-7
    [5]Surendra Byna, Yong Chen, Xian-He Sun. Taxonomy of Data Prefetching for Multicore Processors[J]. Journal of Computer Science and Technology, 2009, 24(3): 405-417.
    [6]ZHANG Feng, CHEN Guoliang, ZHANG Zhaoqing. OpenMP on Networks of Workstations for Software DSMs[J]. Journal of Computer Science and Technology, 2002, 17(1).
    [7]Wu Hong, Nie Xumin. Extending STL with Efficient Data Structures[J]. Journal of Computer Science and Technology, 1998, 13(4): 317-324.
    [8]Tian Xinmin, Wang DingXing, Shen Meiming, Zheng Weimin, Wen Dongchan. Granularity Analysis for Exploiting Adaptive Parallelism of Declarative Programs on Multiprocessors[J]. Journal of Computer Science and Technology, 1994, 9(2): 144-152.
    [9]Zhang Xubo. Some Results on the Confluence Property of Combined Term Rewriting Systems[J]. Journal of Computer Science and Technology, 1991, 6(3): 291-295.
    [10]Huang Zhiyi, Hu Shouren. Detection of And-Parallelism in Logic Programs[J]. Journal of Computer Science and Technology, 1990, 5(4): 379-387.

Catalog

    Article views (32) PDF downloads (1514) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return