• Articles •     Next Articles

Simultaneous Minimization of Capacity and Conflict Misses

Zhiyuan Li   

  1. Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, U.S.A.
  • Received:2006-11-11 Revised:2007-05-08 Online:2007-07-10 Published:2007-07-10

Loop tiling (or loop blocking) is a well-known loop transformation to improve temporal locality in nested loops which perform matrix computations. When targeting caches that have low associativities, one of the key challenges for loop tiling is to simultaneously minimize capacity misses and conflict misses. This paper analyzes the effect of the tile size and the array-dimension size on capacity misses and conflict misses. The analysis supports the approach of combining tile-size selection (to minimize capacity misses) with array padding (to minimize conflict misses).

Key words: timed Petri net; multimedia synchronization; asynchronization user interaction;

[1] Jingling Xue. Loop Tiling for Parallelism. Kluwer Academic Publishers, 2000.

[2] Boulet P, Dongarra J, Robert Y \it et al. \rm Static tiling for heterogeneous computing platforms. -\it Parallel Computing}, 1999, 25(5): 547$\sim$568.

[3] Lam M S, Rothberg E E, Wolf M E. The cache performance and optimizations of blocked algorithms. In -\it Proc. the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems}, Santa Clara, CA, April 1991, pp.63$\sim$74.

[4] Chame J, Moon S. A tile selection algorithm for data locality and cache interference. In -\it Proc. the Thirteenth ACM International Conference on Supercomputing}, Rhodes, Greece, June 1999, pp.492$\sim$499.

[5] Coleman S, McKinley K S. Tile size selection using cache organization and data layout. In -\it Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation}, La Jolla, CA, June 1995, pp.279$\sim$290.

[6] Panda P, Nakamura H, Dutt N \it et al. \rm Augmenting loop tiling with data alignment for improved cache performance. -\it IEEE Transactions on Computers}, February 1999, 48(2): 142$\sim$149.

[7] Rivera G, Tseng C W. A comparison of compiler tiling algorithms. In -\it Proc. the Eighth International Conference on Compiler Construction}, Amsterdam, The Netherlands, March 1999, pp.168$\sim$182.

[8] Hong J W, Kung H. I/O complexity: The red-blue pebble game. In -\it Proc. the Thirteenth Annual ACM Symposium on Theory of Computing}, Milwaukee, Wisconsin, May 1981, pp.326$\sim$333.

[9] Song Y H, Li Z Y. New tiling techniques to improve cache temporal locality. In -\it Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation}, Atlanta, GA, May 1999, pp.215$\sim$228.

[10] Bacon D, Chow J H, Ju D \it et al. \rm A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In -\it Proc. CASCON'94}, Toronto, Ontario, October, 1994, pp.270$\sim$282.

[11] Li Z Y, Song Y H. Automatic tiling of iterative stencil loops. \it ACM Trans. Programming Languages and Systems, \rm November 2004, 26(6): 975$\sim$1028.

[12] Object-Oriented Scientific Computing. http://www.oonume\-rics.org/blitz/benchmarks/. Blitz++.

[13] Admas J C. MUDPACK: Multigrid software for elliptic partial differential equations. http://www.scd.ucar.edu/css/software/ mudpack/.

[14] Ghosh S, Martonosi M, Malik S. Precise miss analysis for program transformations with caches of arbitrary associativity. In -\it Proc. the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systems}, San Jose, California, October 1998, pp.228$\sim$239.

[15] Rivera G, Tseng C W. Tiling optimizations for 3D scientific computations. In -\it Proc. IEEE/ACM SC 2000}, November 2000.
[1] LIANG Yongquan; SHI Zhongzhi;. A Multimedia Synchronization Model Based on Timed Petri Net [J]. , 1999, 14(3): 276-282.
[2] Song Jun; Gu Guanqun;. Modeling Distributed Multimedia Synchronization with DSPN [J]. , 1998, 13(5): 448-454.
[3] Wang Jian; Christine Eisenbeis; Su Bogong;. Using Timed Petri Net to Model Instruction-Level Loop Scheduling with Resource Constraints [J]. , 1994, 9(2): 128-143.
Full text



[1] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[2] Min Yinghua; Yashwant K. Malaiya; Jin Boping;. Aliasing Errors in Parallel Signature Analyzers[J]. , 1990, 5(1): 24 -40 .
[3] Zhang Xing er; Zhu Xiaojun; Li Jianxin; Dong Jianning;. Source-to-Source Conversion Based on Formal Definition[J]. , 1991, 6(2): 178 -184 .
[4] Sui Yuefei;. The Polynomially Exponential Time Restrained Analytical Hierarchy[J]. , 1991, 6(3): 282 -284 .
[5] Li Weidong; Wei Daozheng;. Test Derivation Through Critical Path Transitions[J]. , 1992, 7(1): 12 -18 .
[6] Yan Yong; Jin Canming;. A Theory for the Initial Allocating of Real Time Tasks in Distributed Systems[J]. , 1992, 7(2): 185 -188 .
[7] Ma Jun; Ma Shaohan;. An O(k~2n~2) Algorithm to Find a k-Partition in a k-Connected Graph[J]. , 1994, 9(1): 86 -91 .
[8] Wang Hui; Liu Dayou; Wang Yafei;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[9] Tian Zengping; Wang Yujun; Qu Yunyao; Shi Baile;. On the Expressive Power of F-Logic Language[J]. , 1997, 12(6): 510 -519 .
[10] ZHAO YiXin (赵邑新), YIN Xia (尹 霞) and WU JianPing (吴建平). Problems in the Information Dissemination of the Internet Routing[J]. , 2003, 18(2): 0 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved