›› 2015,Vol. 30 ›› Issue (2): 391-403.doi: 10.1007/s11390-015-1531-4

所属专题: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

用于低功耗时钟树综合的寄存器结群方法

Chao Deng(邓超), Student Member, IEEE, Yi-Ci Cai(蔡懿慈), Senior Member, CCF, ACM, IEEE, Qiang Zhou(周强), Senior Member, CCF, ACM, IEEE   

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • 收稿日期:2014-03-26 修回日期:2014-11-24 出版日期:2015-03-05 发布日期:2015-03-05
  • 作者简介:Chao Deng received his B.S. degree in computer science and technology from Harbin Institute of Technology (HIT), Harbin, in 2010. He is currently pursuing his Ph.D. degree from the EDA Lab, Tsinghua University, Beijing. His research interests include clock network synthesis and optimization.
  • 基金资助:

    This work was supported by the National Natural Science Foundation of China under Grant No. 61274031.

Register Clustering Methodology for Low Power Clock Tree Synthesis

Chao Deng(邓超), Student Member, IEEE, Yi-Ci Cai(蔡懿慈), Senior Member, CCF, ACM, IEEE, Qiang Zhou(周强), Senior Member, CCF, ACM, IEEE   

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2014-03-26 Revised:2014-11-24 Online:2015-03-05 Published:2015-03-05
  • About author:Chao Deng received his B.S. degree in computer science and technology from Harbin Institute of Technology (HIT), Harbin, in 2010. He is currently pursuing his Ph.D. degree from the EDA Lab, Tsinghua University, Beijing. His research interests include clock network synthesis and optimization.
  • Supported by:

    This work was supported by the National Natural Science Foundation of China under Grant No. 61274031.

时钟网络的功耗在集成电路芯片总功耗中占有很大的比例。因此,在高性能集成电路设计中,时钟网络的功耗优化已经成为一个非常重要的目标。传统的方法是通过时钟绕线和缓冲器规划的优化策略解决这一问题。与此相比,本文提出了一种新的寄存器结群策略用来减少时钟网络的功耗。本文共提出了三种不同的寄存器结群算法,分别是KMR,KSR和GSR,并且对它们进行了综合的研究。同时,本文还提出了一个缓冲器分配算法,目的是用最小的功耗代价满足在寄存器簇内部的翻转速率要求。本文将提出的算法集成到一个经典的时钟树综合流程中,在ISPD2010 时钟网络综合竞赛的测试样例上对寄存器结群算法进行了验证。实验结果表明本文提出的三种寄存器结群算法均取得了20%以上的功耗优化,并且对时钟偏差和最大时延没有负面影响。作为三种算法中最有效的一个,GSR算法取得了31%的功耗优化,同时还取得了4%的时钟偏差优化和5%的最大时延优化。并且,集成了本文提出的寄存器结群算法之后,时钟树综合流程[1]的总运行时间减少了接近一个数量级。

Abstract: Clock networks dissipate a significant fraction of the entire chip power budget. Therefore, the optimization for power consumption of clock networks has become one of the most important objectives in high performance IC designs. In contrast to most of the traditional studies that handle this problem with clock routing or buffer insertion strategy, this paper proposes a novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption. Three register clustering algorithms called KMR, KSR and GSR are developed and a comprehensive study of them is discussed in this paper. Meanwhile, a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption. We integrate our algorithms into a classical clock tree synthesis (CTS) flow to test the register clustering methodology on ISPD 2010 benchmark circuits. Experimental results show that all the three register clustering algorithms achieve more than 20% reduction in power consumption without affecting the skew and the maximum latency of the clock tree. As the most effective method among the three algorithms, GSR algorithm achieves a 31% reduction in power consumption as well as a 4% reduction in skew and a 5% reduction in maximum latency. Moreover, the total runtime of the CTS flow with our register clustering algorithms is significantly reduced by almost an order of magnitude.

[1] Pedram M, Rabaey J M. Power Aware Design Methodologies. Kluwer Academic Publisher, 2002.

[2] Cheon Y, Ho P H, Kahng A B, Reda S, Wang Q. Poweraware placement. In Proc. the 42nd Annual Design Au-tomation Conference, Jun. 2005, pp.795-800.

[3] Donno M, Macii E, Mazzoni L. Poweraware clock tree planning. In Proc. the 2004 International Symposium on Phys-ical Design, April 2004, pp.138-147.

[4] Lam T K, Yang X, Tang W C, Wu Y L. On applying erroneous clock gating conditions to further cut down power. In Proc. the 16th Asia and South Paci c Design Automation Conference, Jan. 2011, pp.509-514.

[5] Lu J, Mao X, Taskin B. Clock mesh synthesis with gated local trees and activity driven register clustering. In Proc. IEEE/ACM International Conference on Computer-Aided Design, Nov. 2012, pp.691-697.

[6] Igarashi M, Usami K, Nogami K et al. A low-power design method using multiple supply voltages. In Proc. the 1997 International Symposium on Low Power Electronics and Design, Aug. 1997, pp.36-41.

[7] Lin K Y, Lin H T, Ho T Y. An efficient algorithm of adjustable delay buffer insertion for clock skew minimization in multiple dynamic supply voltage designs. In Proc. the 16th Asia and South Paci c Design Automation Confer-ence, Jan. 2011, pp.825-830.

[8] Li L, Sun J, Lu Y, Zhou H, Zeng X. Low power discrete voltage assignment under clock skew scheduling. In Proc. the 16th Asia and South Paci c Design Automation Con-ference, Jan. 2011, pp.515-520.

[9] Chao T H, Hsu Y C, Ho J M, Kahng A. Zero skew clock routing with minimum wirelength. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Pro-cessing, 1992, 39(11): 799-814.

[10] LiuWH, Li Y L, Chen H C. Minimizing clock latency range in robust clock tree synthesis. In Proc. the 15th Asia and South Paci c Design Automation Conference, Jan. 2010, pp.389-394.

[11] Shih X W, Cheng C C, Ho Y K, Chang Y W. Blockageavoiding buffered clocktree synthesis for clock latency-range and skew minimization. In Proc. the 15th Asia and South Paci c Design Automation Conference, Jan. 2010, pp.395- 400.

[12] Lee D J, Markov I L. Contango: Integrated optimization of soc clock network. In Proc. the 2010 Conference on Design, Automation and Test in Europe, Mar. 2010, pp.1468-1473.

[13] Rakai L, Farshidi A, Behjat L,Westwick D. Buffer sizing for clock networks using robust geometric programming considering variations in buffer sizes. In Proc. the 2013 ACM International Symposium on Physical Design, Mar. 2013, pp.154-161.

[14] Singh J, Nookala V, Luo Z Q, Sapatnekar S. Robust gate sizing by geometric programming. In Proc. the 42nd Annual Design Automation Conference, Jun. 2005, pp.315-320.

[15] Vittal A, Marek-Sadowska M. Lowpower buffered clock tree design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1997, 16(9): 965-975.

[16] Lillis J, Cheng C K, Lin T T Y. Optimal wire sizing and buffer insertion for low power and a generalized delay model. IEEE Journal of Solid-State Circuits, 1996, 31(3): 437-447.

[17] Lu Y, Sze C, Hong X, Zhou Q, Cai Y, Huang L, Hu J. Register placement for low power clock network. In Proc. the 2005 Asia and South Paci c Design Automation Confer-ence, Jan. 2005, pp.588-593.

[18] Hou W, Liu D, Ho P H. Automatic register banking for lowpower clock trees. In Proc. the 10th International Sympo-sium on Quality Electronic Design, Mar. 2009, pp.647-652.

[19] Papa D, Alpert C, Sze C, Li Z, Viswanathan N, Nam G J, Markov I L. Physical synthesis with clock-network optimization for large systems on chips. IEEE Micro, 2011, 31(4): 51-62.

[20] Mehta A D, Chen Y P, Menezes N, Wong D, Pilegg L. Clustering and load balancing for buffered clock tree synthesis. In Proc. the 1997 IEEE International Conference on Com-puter Design: VLSI in Computers and Processors, Oct. 1997, pp.217-223.

[21] Shelar R S. An efficient clustering algorithm for low power clock tree synthesis. In Proc. the 2007 International Sym-posium on Physical Design, Mar. 2007, pp.181-188.

[22] Mitchell T. Machine Learning. McGraw Hill, 1997.

[23] Selim S Z, Ismail M A. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 1984, PAMI6(1): 81-87.

[24] Niu F, Zhou Q, Yao H, Cai Y, Yang J, Sze C N. Obstacleavoiding and slew-constrained buffered clock tree synthesis for skew optimization. In Proc. the 21st Edition of the Great Lakes Symposium on VLSI, May 2011, pp.199-204.

[25] Cormen T H, Leiserson C E, Rivest R L, Stein C. Introduction to Algorithms. Prentice-Hall India, 1998.

[26] Hu S, Alpert C J, Hu J, Karandikar S K, Li Z, Shi W, Sze C N. Fast algorithms for slew-constrained minimum cost buffering. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 2007, 26(11): 2009- 2022.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 沈理;. Testability Analysis at Switch Level for CMOS Circuits[J]. , 1990, 5(2): 197 -202 .
[2] 金志权; 柳诚飞; 孙钟秀; 周晓方; 陈佩佩; 顾建明;. Design and Implementation of a Heterogeneous Distributed Database System[J]. , 1990, 5(4): 363 -373 .
[3] 韩建超; 史忠植;. Formalizing Default Reasoning[J]. , 1990, 5(4): 374 -378 .
[4] 黄志毅; 胡守仁;. Detection of And-Parallelism in Logic Programs[J]. , 1990, 5(4): 379 -387 .
[5] 张幸儿; 朱晓军; 李建新; 董建宁;. Source-to-Source Conversion Based on Formal Definition[J]. , 1991, 6(2): 178 -184 .
[6] 廖先湜; 金兰;. A Mechanism Supporting the Client/Server Relationship in the Operating System of Distributed System “THUDS”[J]. , 1991, 6(3): 256 -262 .
[7] 张钹; 张铃;. A Relation Matrix Approach to Labelling Temporal Relations in Scheduling[J]. , 1991, 6(4): 339 -346 .
[8] 黎仁蔚; 何锫; 张文辉;. An Introduction to IN CAPS System[J]. , 1993, 8(1): 26 -37 .
[9] 韩启龙; 陆汝占; 孙永强;. An Improved Bottom-up Method for Implementing Equational Programming Language[J]. , 1994, 9(1): 63 -69 .
[10] 张中;. Simulation of ATPG Neural Network and Its Experimental Results[J]. , 1995, 10(4): 310 -324 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: