We use cookies to improve your experience with our site.
张昱, 李兆鹏, 曹慧芳. 一种支持高效流水线并行的系统强制的确定性数据流[J]. 计算机科学技术学报, 2015, 30(1): 57-73. DOI: 10.1007/s11390-015-1504-7
引用本文: 张昱, 李兆鹏, 曹慧芳. 一种支持高效流水线并行的系统强制的确定性数据流[J]. 计算机科学技术学报, 2015, 30(1): 57-73. DOI: 10.1007/s11390-015-1504-7
Yu Zhang, Zhao-Peng Li, Hui-Fang Cao. System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism[J]. Journal of Computer Science and Technology, 2015, 30(1): 57-73. DOI: 10.1007/s11390-015-1504-7
Citation: Yu Zhang, Zhao-Peng Li, Hui-Fang Cao. System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism[J]. Journal of Computer Science and Technology, 2015, 30(1): 57-73. DOI: 10.1007/s11390-015-1504-7

一种支持高效流水线并行的系统强制的确定性数据流

System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism

  • 摘要: 流水线并行是新兴应用程序中的一种流行的并行编程模式.然而,直接在传统的多线程共享内存模型上编写流水线并行的应用程序是很困难的,并且容易出错.我们提出DStream,它是一个C编程库,提供了确定性的线程和数据流两个高级抽象以表示流水线中每个阶段的线程和它们之间的通信.确定性数据流构建于我们提出的单生产/多消费(SPMC)虚拟内存模型之上,它通过把将同步集成到虚拟内存来保证对共享内存的确定性访问.我们基于SPMC内存探索了多种策略以有效地实现DStream,从而,在相邻两个阶段的线程之间,生产者可以异步地发布无限个数的数据项,消费者也可以异步地依次取得数据项.我们使用DStream成功地移植了两个典型的流水线并行应用——ferret和dedup,并且总结了移植的相关规则.实验结果表明,使用DStream移植后的ferret和其Pthreads、TBB版本运行时间相当,而使用DStream移植后的dedup在16和32核时分别比Pthreads快2.56倍、7.06倍,比TBB分别快1.06、3.9倍.

     

    Abstract: Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, programming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage workers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer (SPMC) virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published (fixed) and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications - ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively.

     

/

返回文章
返回