Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Similar presentations


Presentation on theme: "Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University."— Presentation transcript:

1 Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University of Michigan

2 Outline ■Motivation and challenges ■Modeling and objectives −Local skew with variation −Local-skew slack −Modeling process variation ■Proposed methodology and techniques −Initial tree construction and buffer insertion −Robustness improvements −Wire snaking and delay buffer insertion ■Empirical validation ■Summary 2 ICCAD 2010, Dong-Jin Lee, University of Michigan

3 Motivation ■Clock networks −Contribute a significant fraction of dynamic power −A limiting factor in high-performance CPUs and SoCs ■Challenges −Interconnect is lagging in performance while transistors continue scaling −Multi-objective optimization – Traditional clock network synthesis constraints – The increasing impact of process variation – Power-performance-cost trade-offs 3 ICCAD 2010, Dong-Jin Lee, University of Michigan

4 Tree vs Mesh ■Objectives −Minimize skew of a high-performance clock tree −Minimize the impact of PVT variations −Clock trees vs meshes, subject to skew < 7.5ps 4 Robustness Power efficiency Trees Ideal clock networks Meshes ICCAD 2010, Dong-Jin Lee, University of Michigan

5 Our Contributions ■The notion of local-skew slack for clock trees ■A tabular technique to estimate the impact of variations ■A path-based technique to enhance the robustness ■A time-budgeting algorithm for clock-tree tuning with minimal power resources ■Fine tuning of clock trees : accurate, fast, power efficient ■Implementation : Contango2.0 ■Strong empirical results : low skew, robustness, low power 5 ICCAD 2010, Dong-Jin Lee, University of Michigan

6 Modeling and Objectives 6 ICCAD 2010, Dong-Jin Lee, University of Michigan

7 Local Skew ■Main objective (concept) −Minimize local skew in the presence of variation ■Definition: Skew − Ψ : Clock tree − λ(s i ) : the clock latency (insertion delay) at sink s i ∈ Ψ − ■Definition: Global Skew ( ω Ψ ) − 7 ICCAD 2010, Dong-Jin Lee, University of Michigan

8 ■Definition: The worst nominal local skew ( ω Ψ Δ ) − Δ : local skew distance bound − dist(s i,s j ) : Manhattan distance between s i and s j ∈ Ψ − ■Definition: The worst local skew with variation ( ω Ψ Δ,ν,y ) − ν : variation model − y : yield ( 0 <y ≤ 1 ) − f(t) : the cumulative distribution function of ω Ψ Δ,ν − Local Skew 8 ICCAD 2010, Dong-Jin Lee, University of Michigan

9 Worst local skew with variation ( ω Ψ Δ,ν,y ) −Probability density function of ω Ψ Δ,ν − Ω Δ = 7.5ps, y = 95%, ω Ψ Δ,ν,y < Ω Δ − ω Ψ Δ,ν,y = 6.05ps Modeling and Objectives - Example 9 ΩΔΩΔ ω Ψ Δ,ν,y ps ICCAD 2010, Dong-Jin Lee, University of Michigan PDFCDFInverse CDFPDF y = 0.95 ω Ψ Δ,ν,y = 6.05ps

10 ■Building variation-tolerant clock trees −such that ω Δ,ν,y < Ω Δ ( Ω Δ – local skew limit) −subject to slew constraints ■Minimizing clock-tree power Optimization Objectives 10 ICCAD 2010, Dong-Jin Lee, University of Michigan ΩΔΩΔ ω Ψ Δ,ν,y ps

11 Local-skew Slack σ(s) for sink s ∈ Ψ ■Definition − σ(s) is the minimum amount of additional delay for s, so that the tree satisfies ω Ψ Δ < Ω Δ ■Example ( Ω δ = 5 ps ) 11 ICCAD 2010, Dong-Jin Lee, University of Michigan

12 Modeling Process Variation ■Impact of variation on skew( s i,s j ) depends on tree path length (s i,s j ), num. buffers (s i,s j ) and type buffers (s i,s j ) ■Notation − T : technology node − B : buffer and wire library − v : variation model ■Variation-estimation table Ξ T,B,ν,y [ w,b,t ] −worst-case increase in skew (with probability y ) between two sinks connected by a tree path of length w with b buffers and the buffer type t 12 ICCAD 2010, Dong-Jin Lee, University of Michigan w : tree path length b : num. of buffers (2) t : buffer type ABCD

13 Modeling Process Variation ■ varEst(s i,s j ) −the worst case variational skew(s i,s j ) − ■Key constraint − 13 ICCAD 2010, Dong-Jin Lee, University of Michigan

14 Initial Tree Construction ■ZST-DME algorithm* based on Elmore delay ■A simple and robust technique for obstacle avoidance** ■Initial buffer insertion − t 0 : the initial buffer type for initial buffer insertion −Use variation-estimation table with path lengths from initial tree −Once t 0 is determined, we adapt the fast variant of van Ginneken’s algorithm*** for initial buffer insertion −Minimize insertion delay, reliable slew rate 14 * : J.-H. Huang et al, “On Bounded-Skew Routing Tree Problem,” DAC‘95 ** : D.-J. Lee et al, “Contango: Integrated Optimization of SoC Clock Networks,” DATE‘10 *** : W. Shi et al, “A Fast Algorithm for Optimal Buffer Insertion,” Trans. on CAD 24(6),2005 ICCAD 2010, Dong-Jin Lee, University of Michigan

15 Robustness Improvement ■Improve robustness after initial buffer insertion so that ω Ψ Δ,ν,y < Ω Δ holds after skew optimization ■ ■The target buffer type for a tree-path between sink s i and s j, t(s i,s j ) is defined as the smallest t such that −choosing smaller buffers reduces capacitance 15 ICCAD 2010, Dong-Jin Lee, University of Michigan

16 Local Skew Optimization : Wire Snaking 16 T target (e) : 11psT actual (e) : 7ps T 2 actual (e) : 3ps T 3 actual (e) : 1ps ICCAD 2010, Dong-Jin Lee, University of Michigan ■Local-skew optimization techniques −based on the optimal tuning amount from the slack computation algorithms with varEst(s i,s j ) ■Improved wire snaking algorithm −speed, accuracy and routing resources e T 1 target (e) : 11psT 1 actual (e) : 7ps T 2 target (e) : 4ps T 3 target (e) : 1ps T actual (e) : 7psT target (e) : 11psT actual (e) : 10psT actual (e) : 11ps T i target (e) ≥ T i actual (e)

17 ■ α : to keep T i actual (e) ≤ T i target (e) efficiently ■Delay model for wire snaking aims for T i actual (e) to satisfy the above inequality with the highest α possible ■Look-up tables for length estimation −to enhance the quality of estimation by wire snaking −a set of SPICE simulations for each technology environment which includes technology model, types of buffers and wires, variation specification ■We achieved α values between 60% and 70% for the ISPD 2010 CNS contest benchmarks Delay Model for Wire Snaking 17 ICCAD 2010, Dong-Jin Lee, University of Michigan

18 ■Wire snaking at buffer outputs is more accurate than at other nodes ■Limiting wire snaking to buffer outputs reduces # of SPICE calls ■Example Optimal Node Selection for Wire Snaking 18 ICCAD 2010, Dong-Jin Lee, University of Michigan

19 ■Highly unbalanced sink capacitances or layout obstacles may result in significant local skew ■Delay buffer insertion −Skew can be reduced by the delay of the inserted buffer −Further precise wire snaking is possible because the inserted buffer isolates the target node ■Example Delay Buffer Insertion 19 ICCAD 2010, Dong-Jin Lee, University of Michigan

20 ISPD’10 Clock Network Synthesis Contest ■ 45nm 2GHz CPU benchmarks from IBM and Intel ■Evaluation −Monte-Carlo SPICE simulations with PVT variations −Skew and slew constraints ( 7.5ps, 100ps ) −Objective : total capacitance — proxy for dynamic power ■A rare opportunity to compare multiple strategies for clock-network synthesis 20 ICCAD 2010, Dong-Jin Lee, University of Michigan

21 ■ispd10cns07 Example of Our Clock Tree 21 ICCAD 2010, Dong-Jin Lee, University of Michigan

22 ■ISPD 2010 benchmarks −2.6 ps nominal local skew −Smaller capacitance than CNSrouter and NTUclock by 4.22× and 4.13× resp. −Our clock trees yield > 95%, while CNSrouter violates yield constraints on 3 benchmarks and NTUclock on 7 Empirical Validation 22 ICCAD 2010, Dong-Jin Lee, University of Michigan

23 ■Local skew constraints are all cleared ■Smaller capacitance than NTU and CUHK by 2.09× and 4.24× resp. ■More robust with smaller capacitance ICCAD 2010 Proceedings 23 ICCAD 2010, Dong-Jin Lee, University of Michigan NTUCUHKContango2 Bench ω Ψ Δ,ν,y Cap. ω Ψ Δ,ν,y Cap. ω Ψ Δ,ν,y Cap. cns017.164457.2311687.01198 cns027.339347.3521007.34376 cns034.881843.95944.1856 cns044.091967.251254.4672 cns053.81897.27744.4138 cns067.49166.79876.0548 cns076.24235.971284.5873 cns085.47235.37975.1552 Avg.5.812.096.404.245.401.0

24 ■Probability density functions (PDF) for skew on ISPD’10 benchmarks Skew Profiles for Contango2 & CNSrouter 24 ICCAD 2010, Dong-Jin Lee, University of Michigan

25 ■When tight local skew constraints, large buffers ensure robustness, increasing capacitance −Much capacitance can be saved when local skew constraints are loose ■Experiments on ispd10cns08 Trade-off - Power vs Robustness to Variations 25 ICCAD 2010, Dong-Jin Lee, University of Michigan

26 ■A tree solution for CPU clock routing −Improves power consumption under tight skew constraints in the presence of variation −Clock trees can be tuned to have nominal skew below 5 ps and low total skew in the presence of variation −4x capacitance improvement on average over mesh structures ■Our clock trees have a higher yield than meshes −meshes are not as easy to tune for nominal skew Summary 26 ICCAD 2010, Dong-Jin Lee, University of Michigan

27 Thank you!! Questions? Questions and Answers 27 ICCAD 2010, Dong-Jin Lee, University of Michigan

28 Questions and Answers 28 ICCAD 2010, Dong-Jin Lee, University of Michigan


Download ppt "Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University."

Similar presentations


Ads by Google