? System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | FAQ
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2015, Vol. 30 Issue (1) :57-73    DOI: 10.1007/s11390-015-1504-7
Special Section on Computer Architecture and Systems for Big Data Current Issue | Archive | Adv Search << Previous Articles | Next Articles >>
System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism
Yu Zhang(张昱), Member, CCF, Zhao-Peng Li(李兆鹏), Member, CCF, Hui-Fang Cao(曹慧芳)
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China

Abstract
Reference
Related Articles
Download: [PDF 720KB]     Export: BibTeX or EndNote (RIS)  
Abstract Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, programming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage workers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer (SPMC) virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published (fixed) and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications - ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively.
Articles by authors
Yu Zhang
Zhao-Peng Li
Hui-Fang Cao
Keywordsdeterministic parallelism   pipeline parallelism   single-producer/multi-consumer   virtual memory     
Received 2014-07-16;
Fund:

This work was supported in part by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901, the National Natural Science Foundation of China under Grant No. 61229201, and the China Postdoctoral Science Foundation under Grant No. 2012M521250.

About author: Yu Zhang is an associate professor in the School of Computer Science and Technology at University of Science and Technology of China, Hefei. Her research spans programming languages, runtime systems, and operating systems, with a particular focus on systems that transparently improve reliability, security, and performance. She is a member of CCF.
Cite this article:   
Yu Zhang, Zhao-Peng Li, Hui-Fang Cao.System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism[J]  Journal of Computer Science and Technology, 2015,V30(1): 57-73
URL:  
http://jcst.ict.ac.cn:8080/jcst/EN/10.1007/s11390-015-1504-7
Copyright 2010 by Journal of Computer Science and Technology