›› 2015, Vol. 30 ›› Issue (1): 57-73.doi: 10.1007/s11390-015-1504-7

Special Issue: Computer Architecture and Systems

• Special Section on Computer Architecture and Systems for Big Data • Previous Articles     Next Articles

System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism

Yu Zhang(张昱), Member, CCF, Zhao-Peng Li(李兆鹏), Member, CCF, Hui-Fang Cao(曹慧芳)   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
  • Received:2014-07-16 Revised:2014-10-14 Online:2015-01-05 Published:2015-01-05
  • About author:Yu Zhang is an associate professor in the School of Computer Science and Technology at University of Science and Technology of China, Hefei. Her research spans programming languages, runtime systems, and operating systems, with a particular focus on systems that transparently improve reliability, security, and performance. She is a member of CCF.
  • Supported by:

    This work was supported in part by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901, the National Natural Science Foundation of China under Grant No. 61229201, and the China Postdoctoral Science Foundation under Grant No. 2012M521250.

Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, programming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage workers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer (SPMC) virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published (fixed) and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications - ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively.

[1] McCool M, Reinders J, Robison A D. Structured Parallel Programming: Patterns for Efficient Computation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2012.

[2] Artho C, Havelund K, Biere A. High-level data races. In Proc. the 1st International Workshop on Veri cation and Validation of Enterprise Information Systems, April 2003, pp.82-93.

[3] Lee E. The problem with threads. Computer, 2006, 39(5): 33-42.

[4] Lu S, Park S, Seo E, Zhou Y. Learning from mistakes — A comprehensive study on real world concurrency bug characteristics. In Proc. the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), March 2008, pp.329-339.

[5] Zhang Y, Ford B. A virtual memory foundation for scalable deterministic parallelism. In Proc. the 2nd APSys, July 2011, pp.7:1-7:5.

[6] Zhang Y, Ford B. Lazy tree mapping: Generalizing and scaling deterministic parallelism. In Proc. the 4th AsiaPaci c Workshop on Systems (APSys), July 2013, pp.20:1-(\d)0:7.

[7] Bienia C, Kumar S, Singh J P et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. the 17th PACT, October 2008, pp.72-81.

[8] Reed E C, Chen N, Johnson R E. Expressing pipeline parallelism using TBB constructs: A case study on what works and what doesn't. In Proc. SPLASH, October 2011, pp.133-(\d)38.

[9] Liu T, Curtsinger C, Berger E. Dthreads: Efficient deterministic multithreading. In Proc. the 23rd SOSP, Oct. 2011, pp.327-336.

[10] Aviram A, Weng S C, Hu S, Ford B. Efficient systemenforced deterministic parallelism. In Proc. the 9th OSDI, October 2010, pp.193-206.

[11] Merrifield T, Eriksson J. Conversion: Multi-version concurrency control for main memory segments. In Proc. the 8th EuroSys, April 2013, pp.127-139.

[12] Aviram A, Ford B, Zhang Y.Workspace consistency: A programming model for shared memory parallelism. In Proc. the 2nd WoDet, March 2011.

[13] Thies W, Karczmarek M, Amarasinghe S. StreamIt: A language for streaming applications. In Proc. the 11th CC, April 2002, pp.179-196.

[14] Olszewski M, Ansel J, Amarasinghe S. Kendo: Efficient deterministic multithreading in software. In Proc. the 14th ASPLOS, March 2009, pp.97-108.

[15] Berger E D, Yang T, Liu T, Novark G. Grace: Safe multithreaded programming for C/C++. In Proc. the 24th OOPSLA, October 2009, pp.81-96.

[16] Bergan T, Anderson O, Devietti J, Ceze L, Grossman D. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In Proc. the 15th ASPLOS, March 2010, pp.53-64.

[17] Cui H, Simsa J, Lin Y H et al. Parrot: A practical runtime for deterministic, stable, and reliable threads. In Proc. the 24th SOSP, November 2013, pp.388-405.

[18] Olszewski M, Ansel J, Amarasinghe S. Scaling deterministic multithreading. In Proc. the 2nd WoDet, March 2011.

[19] Gordon M I, Thies W, Amarasinghe S. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In Proc. the 12th ASPLOS, October 2006, pp.151-162.
No related articles found!
Full text



[1] Shen Yidong;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[2] Shen Yidong;. A Fixpoint Semantics for Stratified Databases[J]. , 1993, 8(2): 12 -21 .
[3] NI Bin; FENG Yulin;. Dynamic Checking Frameworkfor Java Beaus Semantic Constraints[J]. , 1999, 14(4): 408 -413 .
[4] Jing Wang, Li-Yong Zhang, Yan-Bo Han. Client-Centric Adaptive Scheduling of Service-Oriented Applications[J]. , 2006, 21(4): 537 -546 .
[5] Rui-Song Zhang, Wei-Ze Quan, Lu-Bin Fan, Li-Ming Hu, Dong-Ming Yan. Distinguishing Computer-Generated Images from Natural Images Using Channel and Pixel Correlation[J]. Journal of Computer Science and Technology, 2020, 35(3): 592 -602 .
[6] Jie Tang, Chen Liu, Shao-Shan Liu, Zhi-Min Gu, and Jean-Luc Gaudiot. Pinned OS/Services: A Case Study of XML Parsing on Intel SCC[J]. , 2013, 28(1): 3 -13 .
[7] Jie Yan, Guang-Ming Tan, and Ning-Hui Sun. Optimizing Parallel Sn Sweeps on Unstructured Grids for Multi-Core Clusters[J]. , 2013, 28(4): 657 -670 .
[8] Wu Yang Guo-Wei Shen, Wei Wang, Liang-Yi Gong, Miao Yu, Guo-Zhong Dong. Anomaly Detection in Microblogging via Co-Clustering[J]. , 2015, 30(5): 1097 -1108 .
[9] Jie Tang, Xiao-Yan Zhu. Preface[J]. , 2015, 30(5): 1036 -1038 .
[10] Tao Xie. Preface[J]. , 2016, 31(5): 849 -850 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved