›› 2010, Vol. 25 ›› Issue (2): 232-245.

Special Issue: Computer Architecture and Systems

• Special Section on CPU Researches in China • Previous Articles     Next Articles

Managing Data-Objects in Dynamically Reconfigurable Caches

Xue-Jun Yang (杨学军), Member, CCF, ACM, IEEE, Jun-Jie Wu (吴俊杰), Student Member, CCF, ACM, IEEE, Kun Zeng (曾坤), and Yu-Hua Tang (唐玉华)   

  1. National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha 410073, China
  • Received:2009-03-05 Revised:2009-10-09 Online:2010-03-05 Published:2010-03-05
  • About author:bf Xue-Jun Yang is a professor and Ph.D. supervisor of National University of Defense Technology. His main research interests include parallel computer architecture, parallel operating system and parallel compilation.
    Jun-Jie Wu is a Ph.D. candidate in computer science and technology in National University of Defense Technology. His current research interests focus on computer architecture and compilation technology.
    Kun Zeng is a Ph.D. candidate in computer science and technology from National University of Defense Technology. His main research interest is computer architecture.
    Yu-Hua Tang is a professor of National University of Defense Technology. Her main research interests include computer architecture, computer network, and large scale integrated circuit.
  • Supported by:

    This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 60621003, 60873014.

The widening gap between processor and memory speeds makes cache an important issue in the computer system design. Compared with work set of programs, cache resource is often rare. Therefore, it is very important for a computer system to use cache efficiently. Toward a dynamically reconfigurable cache proposed recently, DOOC (Data-Object Oriented Cache), this paper proposes a quantitative framework for analyzing the cache requirement of data-objects, which includes cache capacity, block size, associativity and coherence protocol. And a kind of graph coloring algorithm dealing with the competition between data-objects in the DOOC is proposed as well. Finally, we apply our approaches to the compiler management of DOOC. We test our approaches on both a single-core platform and a four-core platform. Compared with the traditional caches, the DOOC in both platforms achieves an average reduction of 44.98% and 49.69% in miss rate respectively. And its performance is very close to the ideal optimal cache.


[1] Wu J, Yang X, Zeng K, Zhang B, Feng Q, Liu G, Tang Y. DOOC: A software/hardware co-managed cache architecture for reducing cache thrashing. Journal of Computer Research and Development, 2008, 45(12): 2020-2032. (in Chinese)

[2] Wolf M E, Lam M S. A data locality optimizing algorithm. In Proc. the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada, June 26-28, 1991, pp.30-44.
[3] Kandemir M, Choudhary A, Ramanujam J, Banerjee P. Reducing false sharing and improving spatial locality in a unified compilation framework. IEEE Trans. Parallel Distrib. Syst., 2003, 14(4): 337-354.

[4] Chaitin G J. Register allocation & spilling via graph coloring. In Proc. the 1982 SIGPLAN Symposium on Compiler Construction, New York, USA, June 23-25, 1982, pp.98-101.

[5] Briggs P, Cooper K D, Torczon L. Improvements to graph coloring register allocation. ACM Trans. Program. Lang. Syst., 1994, 16(3): 428-455.

[6] Chow F C, Hennessy J L. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst., 1990, 12(4): 501-536.

[7] George L, Appel A W. Iterated register coalescing. ACM Trans. Program. Lang. Syst., 1996, 18(3): 300-324.

[8] Li L, Gao L, Xue J. Memory coloring: A compiler approach for scratchpad memory management. In Proc. the 14th International Conference on Parallel Architectures and Compilation Techniques, Saint Louis, USA, Sept. 17-21, 2005, pp.329338.

[9] Austin T, Larson E, Ernst D. Simplescalar: An infrastructure for computer system modeling. Computer, 2002, 35(2): 59-67.

[10] Wu J, Zhang B, Zhou H, Yang X. SimOpt: A high-efficient opt cache simulator. In Proc. China National Computer Conference, Xi’an, China, Sept. 26-28, 2008, p.50.

[11] Belady L A. A study of replacement algorithms for virtualstorage computer. IBM Systems Journal, 1966, 5(2): 78-101.

[12] Magnusson P S, Christensson M, Eskilson J, Forsgren D, H?allberg G, H¨ogberg J, Larsson F, Moestedt A, Werner B. Simics: A full system simulation platform. Computer, 2002, 35(2): 50-58.

[13] Bailey D H, Barszcz E, Barton J T, Browning D S, Carter R L, Dagum D, Fatoohi R A, Frederickson P O, Lasinski T A, Schreiber R S, Simon H D, Venkatakrishnan V, Weeratunga S K. The NAS parallel benchmarks. The International Journal of Supercomputer Applications, 1991, 5(3): 63-73.

[14] Lee C, Potkonjak M, Mangione-Smith W H. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proc. the 30th Annual IEEE/ACM International Symposium on Microarchitecture, Research Triangle Park, USA, Dec. 1-3, 1997, pp.330-335.

[15] Wu J, Pan X, Liu G, Zhang B, Yang X. Parallel data reuse theory for OpenMP applications. In Proc. the 10th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Daegu, South Korea, May 27-29, 2009, pp.516-523.

[16] Wang L, Yang X, Xue J, Deng Y, Yan X, Tang T, Nguyen Q H. Optimizing scientific application loops on stream processors. In Proc. the International Conference on Language, Compilers, and Tools for Embedded Systems, Tucson, USA, June 12-13, 2008, pp.161-170.

[17] Yang X, Wang L, Xue J, Deng Y, Zhang Y. Comparability graph coloring for optimizing utilization of stream register files in stream processors. In Proc. the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, USA, Feb. 14-18, 2009, pp.111-120.

[18] Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007, pp.302-303.

[19] Bacon D F, Graham S L, Sharp O J. Compiler transformations for high-performance computing. ACM Comput. Surv., 1994, 26(4): 345-420.

[20] Rivera G, Tseng C W. Data transformations for eliminating conflict misses. In Proc. the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI 1998), Montreal, Canada, June 17-19, 1998, pp.38-49.

[21] Callahan D, Kennedy K, Porterfield A. Software prefetching. In Proc. the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, USA, April 8-11, 1991, pp.40-52.

[22] Gonz′alez A, Aliagas C, Valero M. A data cache with multiple caching strategies tuned to different types of locality. In Proc. the 9th International Conference on Supercomputing (ICS 1995), Barcelona, Spain, June 3-7, 1995, pp.338-347.

[23] Kim S, Vijaykrishnan N, Kandemir M, Sivasubramaniam A, Irwin M J, Geethanjali E. Power-aware partitioned cache architectures. In Proc. the 2001 International Symposium on Low Power Electronics and Design, Huntington Beach, USA, Aug. 6-7, 2001, pp.64-67.

[24] Lee H H S, Smelyanskiy M, Tyson G S, Newburn C J. Stack value file: Custom microarchitecture for the stack. In Proc. the 7th International Symposium on High-Performance Computer Architecture, Nuevo Leone, Mexico, Jan. 20-24, 2001, p.5.

[25] Juan T, Royo D, Navarro J J. Dynamic cache splitting. In Proc. the XV International Conference of the Chilean Computer Science Society, Arica, Chile, Oct. 1-3, 1995, pp.253262.

[26] Petrov P, Orail?glu A. Towards effective embedded processors in codesigns: Customizable partitioned caches. In Proc. the Ninth International Symposium on Hardware/Software Codesign, Copenhagen, Denmark, April 25-27, 2001, pp.79-84.

[27] Ranganathan P, Adve S, Jouppi N P. Reconfigurable caches and their application to media processing. In Proc. the 27th Annual International Symposium on Computer Architecture, Vancouver, Canada, June 10-14, 2000, pp.214-224.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved