›› 2016, Vol. 31 ›› Issue (1): 60-76.doi: 10.1007/s11390-016-1612-z

Special Issue: Software Systems

• Special Section on Computer Architecture and Systems with Emerging Technologies • Previous Articles     Next Articles

Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems

Li Shen, Member, CCF, ACM, Fan Xu, and Zhi-Ying Wang, Member, CCF, ACM, IEEE   

  1. 1 State Key Laboratory of High Performance Computing, Changsha 410073, China;
    2 School of Computer, National University of Defense Technology, Changsha 410073, China
  • Received:2015-05-13 Revised:2015-09-28 Online:2016-01-05 Published:2016-01-05
  • About author:Li Shen received his B.S., M.S. and Ph.D. degrees in computer science and technology from the National University of Defense Technology (NUDT), Changsha, in 1997, 2000, and 2003, respectively. Currently, he is an associate professor of the School of Computer, NUDT. His research interests include programming model and compiler design, high performance processor architecture, virtualization technologies, and performance evaluation and workload characterization. He is a member of CCF and ACM.
  • Supported by:

    This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010905 and the National Natural Science Foundation of China under Grant Nos. 61272143 and 61472431.

Thread level speculation provides not only a simple parallel programming model, but also an effective mechanism for thread-level parallelism exploitation. The performance of software speculative parallel models is limited by high global overheads caused by different types of loops. These loops usually have different characteristics of dependencies and different requirements of optimization strategies. In this paper, we propose three comprehensive optimization techniques to reduce different factors of global overheads, aiming at requirements from different types of loops. Inter-thread fetching can reduce the high mis-speculation rate of the loops with frequent dependencies and out-of-order committing can reduce the control overhead of the loops with infrequent dependencies, while enhanced dynamic task granularity resizing can reduce the control overhead and optimize the global overhead of the loops with changing characteristics of dependencies. All these three optimization techniques have been implemented in HEUSPEC, a software TLS system. Experimental results indicate that they can satisfy the demands from different groups of benchmarks. The combination of these techniques can improve the performance of all benchmarks and reach a higher average speedup.

[1] Tian C, Feng M, Nagarajan V, Gupta R. Copy or discard execution model for speculative parallelization on multicores. In Proc. the 41st Annual IEEE/ACM Int. Symp. Microarchitecture, Nov. 2008, pp.330-341.

[2] Ding C, Shen X, Kelsey K, Tice C, Huang R, Zhang C. Software behavior oriented parallelization. In Proc. the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2007, pp.223-234.

[3] Xu F, Shen L, Wang Z, Guo H, Su B, Chen W. HEUSPEC:A software speculation parallel model. In Proc. the 42nd International Conference on Parallel Processing, Oct. 2013, pp.621-630.

[4] Liu S, Eisenbeis C, Gaudiot J L. Speculative execution on GPU:An exploratory study. In Proc. the 39th International Conference on Parallel Processing, Sept. 2010, pp.453-461.

[5] Tian C, Lin C, Feng M, Gupta R. Enhanced speculative parallelization via incremental recovery. In Proc. the 16th ACM Symposium on Principles and Practice of Parallel Programming, Feb. 2011, pp.189-200.

[6] Moore K, Bobba J, Moravan M J, Hill M, Wood D. LogTM:Log-based transactional memory. In Proc. the 12th International Symposium on High-Performance Computer Architecture, Feb. 2006, pp.254-265.

[7] Tian C, Feng M, Gupta R. Speculative parallelization using state separation and multiple value prediction. In Proc. the 9th International Symposium on Memory Management, June 2010, pp.63-72.

[8] Che S, Boyer M, Meng J et al. Rodinia:A benchmark suite for heterogeneous computing. In Proc. IEEE International Symposium on Workload Characterization, Oct. 2009, pp.44-54.

[9] Henning J L. Spec CPU2000:Measuring CPU performance in the new millennium. Computer, 2000, 33(7):28-35.

[10] Spradling C D. Spec CPU2006 benchmark tools. ACM SIGARCH Comput. Archit. News, 2007, 35(1):130-134.

[11] Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite:Characterization and architectural implications. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.

[12] Rodriguez C, de Sande F. The OpenMP source code repository. In Proc. the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Feb. 2005, pp.244-250.

[13] Ke C, Liu L, Zhang C, Bai T, Jacobs B, Ding C. Safe parallel programming using dynamic dependence hints. ACM SIGPLAN Not., 2011, 46(10):243-258.
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved