Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.

Similar presentations


Presentation on theme: "1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National."— Presentation transcript:

1 1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University) University of Wisconsin-Madison http://vlsi.ece.wisc.edu

2 2 Outline  Background Motivation and contribution Literature overview  ClockTune algorithm Problem formulation ClockTune algorithm overview Optimality and complexity analysis  Experimental results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement

3 3 Motivation  Clock skew  cycle time penalty Start with zero-skew clock tree Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90])  Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) P = f CV 2 Minimize switching capacitance (wiring area)  Stability affects design convergence Allow incremental refinement to accommodate local changes  Interconnect delay dominates total delay Wire-sizing is effective in reducing interconnect delay

4 4 Motivation  Non-convex zero-skew constraints No known algorithm solves zero-skew wire-sizing problem optimally with polynomial runtime  Hence, a good clock tree wire-sizing algorithm can  Minimize delay and power  Guarantee optimality and runtime  Have good stability

5 5 Contribution  First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem  Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning  Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality)  Runtime v.s. Optimality tradeoff  Incremental clock re-balancing to speed up design convergence

6 6 Literature Overview  “ Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93] Iteratively optimize skew and delay using adjoint sensitivity analysis Aimed at reliable clock trees under process variation  Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] Bottom-up merging segment construction, top-down embedding  Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00] Handles simultaneous routing, buffer-insertion, and wire-sizing Merging segment set: a set of line samples of a merging region No optimality guarantee The size of MSS grows exponentially  “Process variation aware clock tree routing”, Lu, et al. [ISPD ’03] Based on DME/BST

7 7 Outline  Background Motivation and contribution Literature overview  ClockTune algorithm Problem formulation ClockTune algorithm overview Optimality and complexity analysis  Experimental results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement

8 8 Problem formulation  min-ZSWS (Zero Skew Wire Sizing) problem Given a clock routing minimize s.t. wherePi, Pj are paths from v to leaf nodes i and j  Zero-skew constraints are non-convex constraints No known algorithm solves the problem optimally in polynomial runtime

9 9 DC region approach  Clock Delay and wiring Capacitance are top concerns  Define f : R N  R 2, such that f Y (w) = Delay(T v (w)), f X (w) = Capacitance(T v (w)) DC region (  v ): The projection of the feasible region Choose a d-c pair from the DC region on R 2 DC region Feasible region

10 10 ClockTune algorithm overview  Phase 1: bottom-up construct DC regions for every node  Phase 2: top-down embedding after delay/power tradeoff

11 11 Optimality analysis  Embeddings not fall on the delay samples will be omitted Propagated error Delay sampling error Wire width sampling error (detailed in the paper)

12 12 Optimality analysis  Error is bounded  d : delay sampling resolution  w : wire width sampling resolution k, : Constants related to l, r 0, c 0, w m, w M …  Generally speaking, error reduced about a half when resolution doubled Error Resolution

13 13 Optimality runtime trade off  Control sampling resolution can trade off optimality with runtime and memory

14 14 Complexity analysis  Runtime Bottom-up phase takes O(n p max(p,q)) Top-down phase takes O(np) Overall: O(n p max(p,q))  Memory  O(np) where n : number of nodes of the clock tree, p : number of delay samples taken at each node q : number of wire width samples taken at each level-2 node

15 15 Outline  Background Motivation and contribution Related works problem formulation  ClockTune Algorithm Design space projection Algorithm overview Optimality and complexity analysis  Experimental Results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement

16 16 Experimental setup ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC  Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91]  Initial routing generated by BB+DME algorithm with minimum wire width w = 1  m  ClockTune uses w m = 1  m, w M = 4  m  p: number of delay samples taken at every node  q: number of wire width samples taken at every level-2 node  r 0 = 0.03, c 0 = 2  10 -16 /  m 2

17 17 Runtime and memory usage  Runtime and memory usage are linear to problem size when p, q are fixed  Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB) p, q = 256# sink nodes# branchesRuntime (s)Memory (MB)Optimality r126752724.16.00.38% r2598118561.012.50.71% r38621710100.014.40.46% r419033787202.438.00.57% r531016170339.264.00.93%

18 18 Optimality results  Optimality  Error below 1% with p=q=256  Error reduced to about a half when resolution doubled

19 19 Power/Delay trade-off r5 Capacitance Delay 0.2~1.1nF 5~150ns Minimum power Minimum delay 15:1 delay:power trade-off

20 20 Incremental refinement  DC region captures the design space Enables incremental refinement

21 21 Conclusion & Future Work  Provide a zero-skew clock tree wire-sizing algorithm which Minimizes delay and area ε-optimally Guarantees pseudo-polynomial runtime and memory usage Provides delay/power trade-off information to designers Speeds up design convergence by allowing clock tree re- balancing with minimum changes  Better delay model  Buffer insertion/sizing capability

22 22 Thank you !


Download ppt "1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National."

Similar presentations


Ads by Google