CHAUS:Scalable VM-Based Channels for Unbounded Streaming

Yu Zhang; Yu-Fen Yu; Hui-Fang Cao; Jian-Kang Chen; Qi-Liang Zhang

doi:10.1007/s11390-017-1801-4

Volume 32 Issue 6

November 2017

Turn off MathJax

Article Contents

Abstract

References

Journal of Computer Science and Technology > 2017 > 32(6): 1288-1304. > DOI: 10.1007/s11390-017-1801-4 CSTR: 32374.14.s11390-017-1801-4

Yu Zhang, Yu-Fen Yu, Hui-Fang Cao, Jian-Kang Chen, Qi-Liang Zhang. CHAUS:Scalable VM-Based Channels for Unbounded Streaming[J]. Journal of Computer Science and Technology, 2017, 32(6): 1288-1304. DOI: 10.1007/s11390-017-1801-4

Citation:

Previous Article Next Article

PDF

CHAUS:Scalable VM-Based Channels for Unbounded Streaming

Yu Zhang Senior Member, CCF, Member, ACM, IEEE,
Yu-Fen Yu ,
Hui-Fang Cao ,
Jian-Kang Chen ,
Qi-Liang Zhang

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China

Funds: This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000403.

More Information

Author Bio:
Yu Zhang is an associate professor in School of Computer Science and Technology,University of Science and Technology of China (USTC),Hefei.She received her Ph.D.degree in computer software and theory from USTC,Hefei,in 2005.Her research interests are software security,the construction and evaluation of efficient parallel systems.
Received Date: September 10, 2016
Revised Date: June 02, 2017
Published Date: November 04, 2017

Abstract

Abstract

Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelism. A streaming application is represented by a graph of computation stages that communicate with each other via FIFO channels. In shared-memory environment, an FIFO channel is classically a common, fixed-size synchronized buffer shared between the producer and the consumer. As the number of concurrent stage workers increases, the synchronization overheads, such as contention and waiting times, rise sharply and severely impair application performance. In this paper, we present a novel multithreaded model which isolates memory between threads by default and provides a higher level abstraction for scalable unicast or multicast communication between threads-CHAUS (Channel for Unbounded Streaming). The CHAUS model hides the underlying synchronization details, but requires the user to declare producer-consumer relationship of a channel in advance. It is the duty of the runtime system to ensure reliable data transmission at data item granularity as declared. To achieve unbounded buffer for streaming and reduce the synchronization overheads, we propose a virtual memory based solution to implement a scalable CHAUS channel. We check the programmability of CHAUS by successfully porting dedup and ferret from PARSEC as well as implementing MapReduce library with Phoenix-like API. The experimental results show that workloads built with CHAUS run faster than those with Pthreads, and CHAUS has the best scalability compared with two Pthread versions. There are three workloads whose CHAUS versions only spend no more than 0.17x runtime of Pthreads on both 16 and 32 cores.
- streaming,
- thread model,
- pipeline parallelism,
- unbounded channel,
- virtual memory

FullText(HTML)

References (21)

References

[1]	Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite:Characterization and architectural implications. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, October 2008, pp.72-81.
[2]	Michael M M, Scott M L. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proc. the 15th Annual ACM Symp. Principles of Distributed Computing, May 1996, pp.267-275.
[3]	Tsigas P, Zhang Y. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In Proc. the 13th Annual ACM Symp. Parallel Algorithms and Architectures, July 2001, pp.134-143.
[4]	Kogan A, Petrank E. Wait-free queues with multiple enqueuers and dequeuers. In Proc. the 16th ACM Symp. Principles and Practice of Parallel Programming, February 2011.
[5]	Morrison A, Afek Y. Fast concurrent queues for x86 processors. In Proc. the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2013, pp.103-112.
[6]	Hendler D, Incze I, Shavit N, Tzafrir M. Flat combining and the synchronization-parallelism tradeoff. In Proc. the 22nd Annual ACM Symp. Parallelism in Algorithms and Architectures, June 2010, pp.355-364.
[7]	Feldman S D, Bhat A, LaBorde P, Yi Q, Dechev D. Effective use of non-blocking data structures in a deduplication application. In Proc. the Companion Publication for Conf. Systems, Programming, & Applications:Software for Humanity, October 2013, pp.133-142.
[8]	Thies W, Karczmarek M, Amarasinghe S. StreamIt:A language for streaming applications. In Proc. the 11th Int. Conf. Compiler Construction, April 2002, pp.179-196.
[9]	Thakur R, Rabenseifner R, Gropp W. Optimization of collective communication operations in MPICH. The International Journal of High Performance Computing Applications, 2005, 19(1):49-66.
[10]	Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth:Scalable MapReduce on a large-scale shared-memory system. In Proc. IEEE Int. Symp. Workload Characterization, October 2009, pp.198-207.
[11]	Gloger W. Dynamic memory allocator implementations in Linux system libraries. http://www.malloc.de/papers/malloc-slides.html, May 2006.
[12]	Evans J. A scalable concurrent malloc (3) implementation for FreeBSD. https://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf, July 2017.
[13]	de Melo A C. Performance counters on Linux the new tools. http://vger.kernel.org/~acme/perf/perf.pdf, July 2017.
[14]	Clements A T, Kaashoek M F, Zeldovich N. Scalable address spaces using RCU balanced trees. In Proc. the 17th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2012, pp.199-210.
[15]	Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating MapReduce for multi-core and multiprocessor systems. In Proc. the 13th Int. Symp. High Performance Computer Architecture, February 2007, pp.13-24.
[16]	Zhang Y, Cao H F. DMR:A deterministic MapReduce for multicore systems. International Journal of Parallel Programming, 2017, 45(1):128-141.
[17]	Lu M, Zhang L, Huynh H P, Ong Z, Liang Y, He B S, Goh R S, Huynh R. Optimizing the MapReduce framework on Intel Xeon Phi coprocessor. In Proc. IEEE Int. Conf. Big Data, October 2013, pp.125-130.
[18]	Reed E C, Chen N, Johnson R E. Expressing pipeline parallelism using TBB constructs:A case study on what works and what doesn't. In Proc. the Compilation of the Co-Located Workshops on DSM'11, TMC'11, AGERE! 2011, AOOPES'11, NEAT'11, & VMIL'11, October 2011, pp.133-138.
[19]	Gordon M I, Thies W, Amarasinghe S. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In Proc. the 12th Int. Conf. Architectural Support for Programming Languages and Operating Systems, October 2006, pp.151-162.
[20]	McCool M, Robison A D, Reinders J. Structured Parallel Programming:Patterns for Efficient Computation. Amsterdam:Elsevier, 2012.
[21]	Bienia C, Li K. Characteristics of workloads using the pipeline programming model. In Proc. the Int. Conf. Computer Architecture, June 2010, pp.161-171.

Relative Articles

[1]	Sai Sha, Jing-Yuan Hu, Ying-Wei Luo, Xiao-Lin Wang, Zhenlin Wang. Huge Page Friendly Virtualized Memory Management[J]. Journal of Computer Science and Technology, 2020, 35(2): 433-452. DOI: 10.1007/s11390-020-9693-0
[2]	Yu Zhang, Zhao-Peng Li, Hui-Fang Cao. System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism[J]. Journal of Computer Science and Technology, 2015, 30(1): 57-73. DOI: 10.1007/s11390-015-1504-7
[3]	Philip Machanick. The Value of a Small Microkernel for Dreamy Memory and the RAMpageMemory Hierarchy[J]. Journal of Computer Science and Technology, 2005, 20(5): 586-595.
[4]	WEN Jirong, CHEN Hong, WANG Shan. POTENTIAL: A Highly Adaptive Core of Parallel Database System[J]. Journal of Computer Science and Technology, 2000, 15(6): 527-541.
[5]	WEN Jirong, CHEN Hong, WANG Shan. POTENTIAL:A Highly Adaptive Core of Parallel Database System[J]. Journal of Computer Science and Technology, 2000, 15(6).
[6]	GAO Shuming, WAN Huagen, PENG Qunsheng. Constraint-Based Virtual Solid Modeling[J]. Journal of Computer Science and Technology, 2000, 15(1): 56-63.
[7]	Cai Yong, Heng Phengann, Wu Enhua, Liu Xuehui, Li Hongju, Sun Qingjie. An Image-Based Virtual Reality Prototype System[J]. Journal of Computer Science and Technology, 1998, 13(5): 475-480.
[8]	Hu Weiwu, Shi Weisong, Tang Zhimin. A Framework of Memory Consistency Models[J]. Journal of Computer Science and Technology, 1998, 13(2): 110-124.
[9]	Wang Xiaoming, Yang Qiaolin. Using Virtual ATE Model to Migrate Test Programs[J]. Journal of Computer Science and Technology, 1995, 10(4): 289-297.
[10]	Zhou Jianqiang, Xie Li, Dai Fei, Sun Zhongxiu. Adaptive Memory Coherence Algorithms in DSVM[J]. Journal of Computer Science and Technology, 1994, 9(4): 365-372.

Supplements (0)

Cited By

Get Citation

PDF

XML

Article views (25) PDF downloads (609)

Indexed in:

CHAUS:Scalable VM-Based Channels for Unbounded Streaming

Abstract

References

Related Articles

Catalog

Related

Home

Overview

Resources

Contents

Indexed in:

CHAUS:Scalable VM-Based Channels for Unbounded Streaming

Abstract

References

Related Articles

Catalog

Related

Home

Overview

Resources

Contents

Export File

Citation

Format

Content