Clock networks dissipate a significant fraction of the entire chip power budget. Therefore, the optimization for power consumption of clock networks has become one of the most important objectives in high performance ...Clock networks dissipate a significant fraction of the entire chip power budget. Therefore, the optimization for power consumption of clock networks has become one of the most important objectives in high performance IC designs. In contrast to most of the traditional studies that handle this problem with clock routing or buffer insertion strategy, this paper proposes a novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption. Three register clustering algorithms called KMR, KSR and GSR are developed and a comprehensive study of them is discussed in this paper. Meanwhile~ a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption. We integrate our algorithms into a classical clock tree synthesis (CTS) flow to test the register clustering methodology on ISPD 2010 benchmark circuits. Experimental results show that all the three register clustering algorithms achieve more than 20% reduction in power consumption without affecting the skew and the maximum latency of the clock tree. As the most effective method among the three algorithms, GSR algorithm achieves a 31% reduction in power consumption as well as a 4% reduction in skew and a 5% reduction in maximum latency. Moreover, the total runtime of the CTS flow with our register clustering algorithms is significantly reduced by almost an order of magnitude.展开更多
随着超大规模集成电路(Very Large Scale Integration Circuit,VLSI)制造工艺的快速发展以及其对应集成度的不断提高,数字集成电路的设计迎来了许多挑战。时钟树综合是数字后端设计的重要部分,现有的时钟树综合算法开始面临迭代效率变...随着超大规模集成电路(Very Large Scale Integration Circuit,VLSI)制造工艺的快速发展以及其对应集成度的不断提高,数字集成电路的设计迎来了许多挑战。时钟树综合是数字后端设计的重要部分,现有的时钟树综合算法开始面临迭代效率变低和收敛速度变慢的问题。因此,提出了一种同步并发时钟树分级聚类算法(Synchronous Clock-tree Hierarchical Partitioning and Clustering,SC-HPC)。从系统优化的角度出发,SC-HPC将原始的寄存器聚类过程转化为粗聚类和细聚类两步。粗聚类将布局完成的寄存器分为N大簇群,进一步把N个簇的细化任务分配给用户可调度的线程中进行加速处理。细聚类是根据缓冲器最大扇出的规则进行更加细致地划分寄存器。实验结果表明,相较于现有方法,SC-HPC算法降低了缓冲器数量(30%以上)和程序运行时长(20%以上)。展开更多
基金This work was supported by the National Natural Science Foundation of China under Grant No. 61274031.
文摘Clock networks dissipate a significant fraction of the entire chip power budget. Therefore, the optimization for power consumption of clock networks has become one of the most important objectives in high performance IC designs. In contrast to most of the traditional studies that handle this problem with clock routing or buffer insertion strategy, this paper proposes a novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption. Three register clustering algorithms called KMR, KSR and GSR are developed and a comprehensive study of them is discussed in this paper. Meanwhile~ a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption. We integrate our algorithms into a classical clock tree synthesis (CTS) flow to test the register clustering methodology on ISPD 2010 benchmark circuits. Experimental results show that all the three register clustering algorithms achieve more than 20% reduction in power consumption without affecting the skew and the maximum latency of the clock tree. As the most effective method among the three algorithms, GSR algorithm achieves a 31% reduction in power consumption as well as a 4% reduction in skew and a 5% reduction in maximum latency. Moreover, the total runtime of the CTS flow with our register clustering algorithms is significantly reduced by almost an order of magnitude.
文摘随着超大规模集成电路(Very Large Scale Integration Circuit,VLSI)制造工艺的快速发展以及其对应集成度的不断提高,数字集成电路的设计迎来了许多挑战。时钟树综合是数字后端设计的重要部分,现有的时钟树综合算法开始面临迭代效率变低和收敛速度变慢的问题。因此,提出了一种同步并发时钟树分级聚类算法(Synchronous Clock-tree Hierarchical Partitioning and Clustering,SC-HPC)。从系统优化的角度出发,SC-HPC将原始的寄存器聚类过程转化为粗聚类和细聚类两步。粗聚类将布局完成的寄存器分为N大簇群,进一步把N个簇的细化任务分配给用户可调度的线程中进行加速处理。细聚类是根据缓冲器最大扇出的规则进行更加细致地划分寄存器。实验结果表明,相较于现有方法,SC-HPC算法降低了缓冲器数量(30%以上)和程序运行时长(20%以上)。