Processing math: 100%
We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Zhiyuan Li. Simultaneous Minimization of Capacity and Conflict Misses[J]. Journal of Computer Science and Technology, 2007, 22(4): 497-504.
Citation: Zhiyuan Li. Simultaneous Minimization of Capacity and Conflict Misses[J]. Journal of Computer Science and Technology, 2007, 22(4): 497-504.

Simultaneous Minimization of Capacity and Conflict Misses

More Information
  • Received Date: November 10, 2006
  • Revised Date: May 07, 2007
  • Published Date: July 14, 2007
  • Loop tiling (or loop blocking) is a well-known loop transformationto improve temporal locality in nested loops which perform matrixcomputations. When targeting caches that have low associativities, oneof the key challenges for loop tiling is to simultaneously minimizecapacity misses and conflict misses. This paper analyzes the effect ofthe tile size and the array-dimension size on capacity misses andconflict misses. The analysis supports the approach of combiningtile-size selection (to minimize capacity misses) with array padding (tominimize conflict misses).
  • [1]
    Jingling Xue. Loop Tiling for Parallelism. Kluwer Academic Publishers, 2000.
    [2]
    Boulet P, Dongarra J, Robert Y \it et al. \rm Static tiling for heterogeneous computing platforms. -\it Parallel Computing}, 1999, 25(5): 547568.
    [3]
    Lam M S, Rothberg E E, Wolf M E. The cache performance and optimizations of blocked algorithms. In -\it Proc. the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems}, Santa Clara, CA, April 1991, pp.6374.
    [4]
    Chame J, Moon S. A tile selection algorithm for data locality and cache interference. In -\it Proc. the Thirteenth ACM International Conference on Supercomputing}, Rhodes, Greece, June 1999, pp.492499.
    [5]
    Coleman S, McKinley K S. Tile size selection using cache organization and data layout. In -\it Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation}, La Jolla, CA, June 1995, pp.279290.
    [6]
    Panda P, Nakamura H, Dutt N \it et al. \rm Augmenting loop tiling with data alignment for improved cache performance. -\it IEEE Transactions on Computers}, February 1999, 48(2): 142149.
    [7]
    Rivera G, Tseng C W. A comparison of compiler tiling algorithms. In -\it Proc. the Eighth International Conference on Compiler Construction}, Amsterdam, The Netherlands, March 1999, pp.168182.
    [8]
    Hong J W, Kung H. I/O complexity: The red-blue pebble game. In -\it Proc. the Thirteenth Annual ACM Symposium on Theory of Computing}, Milwaukee, Wisconsin, May 1981, pp.326333.
    [9]
    Song Y H, Li Z Y. New tiling techniques to improve cache temporal locality. In -\it Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation}, Atlanta, GA, May 1999, pp.215228.
    [10]
    Bacon D, Chow J H, Ju D \it et al. \rm A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In -\it Proc. CASCON'94}, Toronto, Ontario, October, 1994, pp.270282.
    [11]
    Li Z Y, Song Y H. Automatic tiling of iterative stencil loops. \it ACM Trans. Programming Languages and Systems, \rm November 2004, 26(6): 9751028.
    [12]
    Object-Oriented Scientific Computing. http://www.oonume\-rics.org/blitz/benchmarks/. Blitz++.
    [13]
    Admas J C. MUDPACK: Multigrid software for elliptic partial differential equations. http://www.scd.ucar.edu/css/software/ mudpack/.
    [14]
    Ghosh S, Martonosi M, Malik S. Precise miss analysis for program transformations with caches of arbitrary associativity. In -\it Proc. the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systems}, San Jose, California, October 1998, pp.228239.
    [15]
    Rivera G, Tseng C W. Tiling optimizations for 3D scientific computations. In -\it Proc. IEEE/ACM SC 2000}, November 2000.
  • Related Articles

    [1]Shuai Han, Xian-Min Liu, Jian-Zhong Li. Efficient Partitioning Method for Optimizing the Compression on Array Data[J]. Journal of Computer Science and Technology, 2022, 37(5): 1049-1067. DOI: 10.1007/s11390-022-2371-7
    [2]Tang Weiyu, Shi Wu, Zang Binxu, Zhu Chuanqi. Exploiting Loop Parallelism with Redundant Execution[J]. Journal of Computer Science and Technology, 1997, 12(2): 105-112.
    [3]Chung-Han CHEN. Embedding Binary Tree in VLSI/WSI Processor Array[J]. Journal of Computer Science and Technology, 1996, 11(3): 326-336.
    [4]LIN Hua, LU Mi, Jesse Z.FANG. A Direct Approach for Finding Loop Transformation Matrices[J]. Journal of Computer Science and Technology, 1996, 11(3): 237-256.
    [5]Ma Guangsheng, Zhang Zhongwei, Huang Shaobin. A New Method of Solving Kernels in Algebraic Decomposition for the Synthesis of Logic Cell Array[J]. Journal of Computer Science and Technology, 1995, 10(6): 569-573.
    [6]Tang Zhimin, Xia Peisu. A Maximum Time Difference Pipelined Arithmetic Unit Based on CMOS Gate Array[J]. Journal of Computer Science and Technology, 1995, 10(2): 97-103.
    [7]Liu Hong, Wang Wenhong, Zhang Defu. A Methodology for Mapping and Partitioning Arbitrary N-Dimensional Nested Loops into 2-Dimensional VLSI Arrays[J]. Journal of Computer Science and Technology, 1993, 8(3): 31-42.
    [8]Xu Shuruen. On the Equivalence of Some Models of Computation[J]. Journal of Computer Science and Technology, 1988, 3(4): 306-309.
    [9]Sun Yongqiang. Verification of Systolic Array:An FP Functional Approach[J]. Journal of Computer Science and Technology, 1988, 3(2): 81-101.
    [10]Xia Peisu, Fang Xinwo, Wang Yuxiang, Yan Kaiming, Zhang Tingjun, Liu Yulan, Zhao Chunying, Sun Jizhong. Design of Array Processor Systems[J]. Journal of Computer Science and Technology, 1987, 2(3): 163-173.

Catalog

    Article views (23) PDF downloads (11068) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return