›› 2015,Vol. 30 ›› Issue (1): 57-73.doi: 10.1007/s11390-015-1504-7

所属专题: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

一种支持高效流水线并行的系统强制的确定性数据流

Yu Zhang(张昱), Member, CCF, Zhao-Peng Li(李兆鹏), Member, CCF, Hui-Fang Cao(曹慧芳)   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
  • 收稿日期:2014-07-16 修回日期:2014-10-14 出版日期:2015-01-05 发布日期:2015-01-05
  • 作者简介:Yu Zhang is an associate professor in the School of Computer Science and Technology at University of Science and Technology of China, Hefei. Her research spans programming languages, runtime systems, and operating systems, with a particular focus on systems that transparently improve reliability, security, and performance. She is a member of CCF.
  • 基金资助:

    This work was supported in part by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901, the National Natural Science Foundation of China under Grant No. 61229201, and the China Postdoctoral Science Foundation under Grant No. 2012M521250.

System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism

Yu Zhang(张昱), Member, CCF, Zhao-Peng Li(李兆鹏), Member, CCF, Hui-Fang Cao(曹慧芳)   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
  • Received:2014-07-16 Revised:2014-10-14 Online:2015-01-05 Published:2015-01-05
  • About author:Yu Zhang is an associate professor in the School of Computer Science and Technology at University of Science and Technology of China, Hefei. Her research spans programming languages, runtime systems, and operating systems, with a particular focus on systems that transparently improve reliability, security, and performance. She is a member of CCF.
  • Supported by:

    This work was supported in part by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901, the National Natural Science Foundation of China under Grant No. 61229201, and the China Postdoctoral Science Foundation under Grant No. 2012M521250.

流水线并行是新兴应用程序中的一种流行的并行编程模式.然而,直接在传统的多线程共享内存模型上编写流水线并行的应用程序是很困难的,并且容易出错.我们提出DStream,它是一个C编程库,提供了确定性的线程和数据流两个高级抽象以表示流水线中每个阶段的线程和它们之间的通信.确定性数据流构建于我们提出的单生产/多消费(SPMC)虚拟内存模型之上,它通过把将同步集成到虚拟内存来保证对共享内存的确定性访问.我们基于SPMC内存探索了多种策略以有效地实现DStream,从而,在相邻两个阶段的线程之间,生产者可以异步地发布无限个数的数据项,消费者也可以异步地依次取得数据项.我们使用DStream成功地移植了两个典型的流水线并行应用——ferret和dedup,并且总结了移植的相关规则.实验结果表明,使用DStream移植后的ferret和其Pthreads、TBB版本运行时间相当,而使用DStream移植后的dedup在16和32核时分别比Pthreads快2.56倍、7.06倍,比TBB分别快1.06、3.9倍.

Abstract: Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, programming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage workers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer (SPMC) virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published (fixed) and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications - ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively.

[1] McCool M, Reinders J, Robison A D. Structured Parallel Programming: Patterns for Efficient Computation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2012.

[2] Artho C, Havelund K, Biere A. High-level data races. In Proc. the 1st International Workshop on Veri cation and Validation of Enterprise Information Systems, April 2003, pp.82-93.

[3] Lee E. The problem with threads. Computer, 2006, 39(5): 33-42.

[4] Lu S, Park S, Seo E, Zhou Y. Learning from mistakes — A comprehensive study on real world concurrency bug characteristics. In Proc. the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), March 2008, pp.329-339.

[5] Zhang Y, Ford B. A virtual memory foundation for scalable deterministic parallelism. In Proc. the 2nd APSys, July 2011, pp.7:1-7:5.

[6] Zhang Y, Ford B. Lazy tree mapping: Generalizing and scaling deterministic parallelism. In Proc. the 4th AsiaPaci c Workshop on Systems (APSys), July 2013, pp.20:1-(\d)0:7.

[7] Bienia C, Kumar S, Singh J P et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. the 17th PACT, October 2008, pp.72-81.

[8] Reed E C, Chen N, Johnson R E. Expressing pipeline parallelism using TBB constructs: A case study on what works and what doesn't. In Proc. SPLASH, October 2011, pp.133-(\d)38.

[9] Liu T, Curtsinger C, Berger E. Dthreads: Efficient deterministic multithreading. In Proc. the 23rd SOSP, Oct. 2011, pp.327-336.

[10] Aviram A, Weng S C, Hu S, Ford B. Efficient systemenforced deterministic parallelism. In Proc. the 9th OSDI, October 2010, pp.193-206.

[11] Merrifield T, Eriksson J. Conversion: Multi-version concurrency control for main memory segments. In Proc. the 8th EuroSys, April 2013, pp.127-139.

[12] Aviram A, Ford B, Zhang Y.Workspace consistency: A programming model for shared memory parallelism. In Proc. the 2nd WoDet, March 2011.

[13] Thies W, Karczmarek M, Amarasinghe S. StreamIt: A language for streaming applications. In Proc. the 11th CC, April 2002, pp.179-196.

[14] Olszewski M, Ansel J, Amarasinghe S. Kendo: Efficient deterministic multithreading in software. In Proc. the 14th ASPLOS, March 2009, pp.97-108.

[15] Berger E D, Yang T, Liu T, Novark G. Grace: Safe multithreaded programming for C/C++. In Proc. the 24th OOPSLA, October 2009, pp.81-96.

[16] Bergan T, Anderson O, Devietti J, Ceze L, Grossman D. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In Proc. the 15th ASPLOS, March 2010, pp.53-64.

[17] Cui H, Simsa J, Lin Y H et al. Parrot: A practical runtime for deterministic, stable, and reliable threads. In Proc. the 24th SOSP, November 2013, pp.388-405.

[18] Olszewski M, Ansel J, Amarasinghe S. Scaling deterministic multithreading. In Proc. the 2nd WoDet, March 2011.

[19] Gordon M I, Thies W, Amarasinghe S. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In Proc. the 12th ASPLOS, October 2006, pp.151-162.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 沈一栋;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[2] 沈一栋;. A Fixpoint Semantics for Stratified Databases[J]. , 1993, 8(2): 12 -21 .
[3] 倪彬; 冯玉琳;. Dynamic Checking Frameworkfor Java Beaus Semantic Constraints[J]. , 1999, 14(4): 408 -413 .
[4] . 以客户端为核心的面向服务应用的适应性调度[J]. , 2006, 21(4): 537 -546 .
[5] Rui-Song Zhang, Wei-Ze Quan, Lu-Bin Fan, Li-Ming Hu, Dong-Ming Yan. 基于通道和像素相关性的计算机生成图像与自然图像鉴别[J]. 计算机科学技术学报, 2020, 35(3): 592 -602 .
[6] Jie Tang, Chen Liu, Shao-Shan Liu, Zhi-Min Gu, Jean-Luc Gaudiot. 多核服务分工:Intel SCC下的XML数据解析研究[J]. , 2013, 28(1): 3 -13 .
[7] Jie Yan, Guang-Ming Tan, and Ning-Hui Sun. 非规则网格上的Sn Sweep并行算法[J]. , 2013, 28(4): 657 -670 .
[8] Wu Yang Guo-Wei Shen, Wei Wang, Liang-Yi Gong, Miao Yu, Guo-Zhong Dong. 基于联合聚类的微博异常检测[J]. , 2015, 30(5): 1097 -1108 .
[9] Jie Tang, Xiao-Yan Zhu. Preface[J]. , 2015, 30(5): 1036 -1038 .
[10] Tao Xie. Preface[J]. , 2016, 31(5): 849 -850 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: