Special Issue: Computer Architecture and Systems

• Articles • Previous Articles     Next Articles

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

Han-Xin Sun, Kun-Peng Yang, Yu-Lai Zhao, Dong Tong, and Xu Cheng   

  1. Microprocessor Research and Development Center, Peking University, Beijing 100871, China
  • Received:2007-01-07 Revised:2007-08-09 Online:2008-01-15 Published:2008-01-10

The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7\% and 64.1\% respectively, and the dynamic power of instruction TLB is reduced by 90.2\%. Meanwhile the performance degradation in the worst case is only 0.63\%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.

Key words: 3D manipulation; solid modeling; virtual environment; constraint;



[1] Wilcox K, Manne Srilatha. Alpha processors: A history of power issues and a look to the future. Nov. 15th, 1999, http://www.eecs.umich.edu/$\sim$tnm/cool.html.

[2] Manne S, Klauser A, Grunwald D. Pipeline gating: Speculation control for energy reduction. In -\it Proc. 25th Int. Symposium on Computer Architecture}, Barcelona, Spain, 1998, pp.132$\sim$141.

[3] Montanaro J \it et al. \rm A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. \it IEEE Journal of Solid-State Circuits, \rm 1996, 32(11): 1703$\sim$1714.

[4] Kim N S, Flautner K, Blaauw D, Mudge T. Drowsy instruction caches. In -\it Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture}, Istanbul, Turkey, 2002, pp.219$\sim$230.

[5] Chang Y, Ruan S, Lai F. Design and analysis of low-power cache using two-level filter scheme. \it IEEE Trans. Very Large Scale Integration $($VLSI$)$ Systems, \rm 2003, 11(4): 568$\sim$580.

[6] Kadayif I, Sivasubramaniam A, Kandemir M, Kandiraju G, Chen G. Generating physical addresses directly for saving instruction TLB energy. In -\it Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture}, Istanbul, Turkey, 2002, pp.185$\sim$196.

[7] Bellas N, Hajj I N, Polychronopoulos C D, Stamoulis G. Architectural and compiler techniques for energy reduction in high-performance microprocessors. -\it IEEE Trans. Very Large Scale Integration Systems}, 2000, 8(3): 317$\sim$326.

[8] Su C L, Despain A M. Cache design for energy efficiency. In -\it Proc. 28th Int. System Sciences Conference}, Hawaii, USA, 1995, pp.306$\sim$315.

[9] Ghose K, Kamble M B. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In -\it Proc. Int. Symposium on Low Power Electronics and Design}, San Diego, CA, USA, 1999, pp.70$\sim$75.

[10] Powell M D, Agarwal A, Vijaykumar T N, Falsafi B, Roy K. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In -\it Proc. Int. Symposium on Microarchitecture}, Austin, Texas, USA, 2001, pp.54$\sim$65.

[11] Powell M D, Yang S, Falsafi B, Roy K, Vijaykumar T M. Reducing leakage in a high-performance deep submicron instruction cache. -\it IEEE Trans. Very Large Scale Integration $($VLSI$)$ Systems}, 2001, 9(1): 77$\sim$89.

[12] Kim N S, Flautner K, Blaauw D, Mudge T. Circuit and microarchitectural techniques for reducing cache leakage power. -\it IEEE Trans. Very Large Scale Integration $($VLSI$)$ Systems}, 2004, 12(2): 167$\sim$184.

[13] Agarwal A, Li H, Roy K. DRG-cache: A data retention gated-ground cache for low power. In -\it Proc. Design Automation Conference}, New Orleans, LA, USA, 2002, pp.473$\sim$478.

[14] Heo S, Barr K, Hampton M, Asanovic K. Dynamic fine-grain leakage reduction using leakage-biased bitlines. In -\it Proc. Int. Symposium on Computer Architecture}, Anchorage, Alaska, USA, 2002, pp.137$\sim$147.

[15] Soontae K, Vijaykrishnan N, Kandemir M, Irwin M J. Predictive precharging for bitline leakage energy reduction. In -\it Proc. IEEE ASIC/SOC Conference}, 2002, pp.36$\sim$40.

[16] Kim N S, Flautner K, Blaauw D, Mudge T. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In -\it Proc. Int. Symposium on Low Power Electronics and Design}, Newport Beach, California, USA, 2004, pp.54$\sim$57.

[17] Lee J, Park G, Park S, Kim S. A selective filter-bank TLB system. In -\it Proc. Int. Symposium on Low Power Electronics and Design}, Seoul, Korea, 2003, pp.312$\sim$317.

[18] Fan D, Tang Z, Huang H, Gao G. An energy efficient TLB design methodology. In -\it Proc. Int. Symposium on Low Power Electronics and Design}, San Diego, California, USA, 2005, pp.351$\sim$356.

[19] Smith J E, Sohi G S. The microarchitecture of superscalar processors. -\it Proc. the IEEE}, 1995, 83(12): 1609$\sim$1624.

[20] Horel T, Lauterbach G. UltraSPARC-III: Designing third-generation 64-bit performance. -\it IEEE Micro}, 1999, 19(3): 73$\sim$85.

[21] Inoue K, Moshnyaga V G, Murakami K. A low energy set-associative I-Cache with extended BTB. In -\it Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors}, Freiburg, Germany, 2002, pp.187$\sim$192.

[22] Reinman G, Jouppi N. CACTI 2.0: An integrated cache timing and power model. Compaq, Palo Alto, CA, WRL Res. Rep., July 2000.

[23] Seznec A, Felix S, Krishnam V, Sazeides Y. Design tradeoffs for the Alpha EV8 conditional branch predictor. In -\it Proc. 29th Int. Symposium on Computer Architecture}, Anchorage, Alaska, USA, 2002, pp.295$\sim$306.

[24] Hossain A, Pease D J, Burns J S, Parveen N. Trace cache performance parameters. In -\it Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors}, Freiburg, Germany, 2002, pp.348$\sim$355.

[25] Hu J S, Vijaykrishnan N, Irwin M J, Kandemir M. Using dynamic branch behavior for power-efficient instruction fetch. In -\it Proc. the IEEE Computer Society Annual Symposium on VLSI}, Tampa, Florida, USA, 2003, pp.127$\sim$132.

[26] Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M R. Hotleakage: An architectural, temperature-aware model of subthreshold and gate leakage. Tech. Report CS-2003-05, Department of Computer Sciences, University of Virginia, Virginia, USA, Mar. 2003.

[27] Burger D C, Austin T M. The SimpleScalar tool set, Version 2.0. \it Computer Architecture News, \rm New York, USA, \rm 1997, 25(3): 13$\sim$25.

[28] Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural power analysis and optimizations. In -\it Proc. 27th Int. Symposium on Computer Architecture}, British Columbia, Canada, 2000, pp.83$\sim$94.

[29] Shivakumar P, Jouppi N. CACTI 3.0: An integrated cache timing, power, and area model. Compaq, Palo Alto, CA, WRL Res. Rep., Feb. 2001.

[30] Standard Performance Evaluation Corp. http://www. specbench.org.

[31] Baniasadi A, Moshovos A. SEPAS: A highly accurate and energy-efficient branch predictor. In -\it Proc. Int. Symposium on Low Power Electronics and Design}, Newport Beach, California, USA, 2004, pp.38$\sim$43.

[32] Deris K J, Baniasadi A. SABA: A zero timing overhead power-aware BTB for high-performance processors. \it Workshop on Unique Chips and Systems \rm held in conjunction with \it IEEE International Symposium on Performance Analysis of Systems and Software, \rm Austin, Texas, USA, 2006.
[1] William Croft, Jörg-Rüdiger Sack, and Wei Shi. Differential Privacy via a Truncated and Normalized Laplace Mechanism [J]. Journal of Computer Science and Technology, 2022, 37(2): 369-388.
[2] Hong-Cheu Liu, Jixue Liu. On the Expressive Power of Logics on Constraint Databases with Complex Objects [J]. Journal of Computer Science and Technology, 2019, 34(4): 795-817.
[3] Ming-Zhe Zhang, Yun-Zhan Gong, Ya-Wen Wang, Da-Hai Jin. Unit Test Data Generation for C Using Rule-Directed Symbolic Execution [J]. Journal of Computer Science and Technology, 2019, 34(3): 670-689.
[4] Xu-Zhou Zhang, Yun-Zhan Gong, Ya-Wen Wang, Ying Xing, Ming-Zhe Zhang. Automated String Constraints Solving for Programs Containing String Manipulation Functions [J]. , 2017, 32(6): 1125-1135.
[5] Zai-Liang Chen, Peng Peng, Bei-Ji Zou, Hai-Lan Shen, Hao Wei, Rong-Chang Zhao. Automatic Anterior Lamina Cribrosa Surface Depth Measurement Based on Active Contour and Energy Constraint [J]. , 2017, 32(6): 1214-1221.
[6] Ji-Bing Gong, Li-Li WangSheng-Tao Sun, Si-Wei Peng. iBole: A Hybrid Multi-Layer Architecture for Doctor Recommendation in Medical Social Networks [J]. , 2015, 30(5): 1073-1081.
[7] Bo Wang, Ying-Fei Xiong, Zhen-Jiang Hu, Hai-Yan Zhao, Wei Zhang, and Hong Mei. Interactive Inconsistency Fixing in Feature Modeling [J]. , 2014, 29(4): 724-736.
[8] Ji-Wei Jin (金继伟), Fei-Fei Ma (马菲菲), Member, ACM and Jian Zhang (张健), Senior Member, CCF, ACM, IEEE. Integrating Standard Dependency Schemes in QCSP Solvers [J]. , 2012, 27(1): 37-41.
[9] Ai-Hua Wu, Zi-Jing Tan, and Wei Wang, Senior Member, CCF. Annotation Based Query Answer over Inconsistent Database [J]. , 2010, 25(3): 469-481.
[10] Feng Zeng, Student Member, CCF, and Zhi-Gang Chen, Member, CCF. Cost-Sensitive and Load-Balancing Gateway Placement in Wireless Mesh Networks with QoS Constraints [J]. , 2009, 24(4): 775-785.
[11] Ali A. Alwan, Hamidah Ibrahim, and Nur Izura Udzir. Improved Integrity Constraints Checking in Distributed Databases by Exploiting Local Checking [J]. , 2009, 24(4): 665-674.
[12] Dan Hao, Member, CCF, Lu Zhang,Senior Member, CCF, Ming-Hao Liu, He Li, and Jia-Su Sun, Senior Member, CCF. Test-Data Generation Guided by Static Defect Detection [J]. , 2009, 24(2): 284-293.
[13] Yong Zhao, Xin-Guo Liu, Qun-Sheng Peng, and Hu-Jun Bao. Rigidity Constraints for Large Mesh Deformation [J]. , 2009, 24(1 ): 47-55 .
[14] George W. Hart. An Algorithm for Constructing 3D Struts [J]. , 2009, 24(1 ): 56-64 .
[15] Bin Wang, Xiao-Chun Yang, Guo-Ren Wang, Ge Yu, Lei Chen, X. Sean Wang, and Xue-Min Lin. Continually Answering Constraint $\pmb k$-{\it\bfseries NN} Queries in Unstructured P2P Systems [J]. , 2008, 23(4 ): 538-556 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Xu Jie; Li Qingnan; Huang Shize; Xu Jiangfeng;. DFTSNA:A Distributed Fault-Tolerant Shipboard System[J]. , 1990, 5(2): 109 -116 .
[2] Klaus Buchenrieder;. Standard-Cell Placement from Functional Descriptions[J]. , 1991, 6(1): 37 -46 .
[3] Ma Jun; Ma Shaohan;. An O(k~2n~2) Algorithm to Find a k-Partition in a k-Connected Graph[J]. , 1994, 9(1): 86 -91 .
[4] Wang Xianzhu; Liao Heng; Li Sanli;. DYNAMEM-A Microarchitecture for Improving Memory Disambiguation at Run-Time[J]. , 1996, 11(6): 589 -600 .
[5] Fu Yuxi;. Symmetric π-Calculus[J]. , 1998, 13(3): 202 -208 .
[6] Lao Zhiqiang; Pan Yunhe;. A Knowledge Representation Model for Video-Based Animation[J]. , 1998, 13(3): 228 -237 .
[7] GAO Shuming; WAN Huagen; PENG Qunsheng;. Constraint-Based Virtual Solid Modeling[J]. , 2000, 15(1): 56 -63 .
[8] Li Shen. VFSim: Concurrent Fault Simulation at Register Transfer Level[J]. , 2005, 20(2): 175 -186 .
[9] Wei Lu, Xiu-Tao Yang, Tao Lv, and Xiao-Wei Li. An Efficient Evaluation and Vector Generation Method for Observability-Enhanced Statement Coverage[J]. , 2005, 20(6): 875 -884 .
[10] Joo-Haeng Lee and Hyungjun Park. Geometric Properties of Ribs and Fans of a Bezier Curve[J]. , 2006, 21(2): 279 -283 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved