Presentation is loading. Please wait.

Presentation is loading. Please wait.

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.

Similar presentations


Presentation on theme: "Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of."— Presentation transcript:

1 Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award.

2 Motivation Increasing interconnect power  35% cells are buffers at 65nm technology [Saxena, TCAD 04] Previous work  Power-optimal single V dd buffer insertion [Lillis, JSSC 96]  Delay-optimal buffered tree generation [Cong, DAC 00; Alpert, TCAD 02] No existing algorithms consider dual-V dd for buffer insertion or buffered tree generation

3 Major Contributions First in-depth study of dual V dd buffer insertion and buffered tree generation  Large power saving over single V dd buffering Efficient algorithms for power optimality  17x faster than [Lillis, JSSC 96] when single Vdd is considered

4 Outline Dual V dd buffer insertion and sizing (DVB)  Problem formulation  Sampling for speedup  Experimental results Dual V dd buffered tree generation (D-Tree)  Problem formulation  Improved augmented orthogonal search tree  Experimental results

5 Delay, Slew and Power Modeling Elmore delay  Wire:, buffer:  Bakoglu ’ s slew metric (ln 9 ∙Elmore) Power = energy per switch  Wire:  Lumped buffer dynamic/short-circuit power  Can be easily extended to leakage power Low V dd (V L ) reduces leakage Need to assume of clock rate and switching activity

6 Introducing Dual V dd Buffering Achieves power saving since power α V dd 2 Suffer no loss of delay optimality V L => V H requires level converter (LC)  Restore voltage level and reduce leakage  Ext-CVS for logic [Srivastava, ISLPED 04] LC delay and power overhead amortized V VLVL VHVH V I Reduced noise margin Leakage VHVH V I

7 Key Observation in Dual V dd Buffering Disallowing V L => V H will not affect optimality  Optimality empirically illustrated (@ 65nm): (a) has LC and V H drives C l, power (a) > (b) Delay (b) > (a) only if C l > 0.5pF (~ 9mm wire) VHVH VLVL

8 DVB Formulation Dual V dd Buffer Insertion (DVB)  Given interconnect tree  Find buffer placement, V dd assignment for buffers, sizes of buffers V H buffers driving V L buffers within the tree Level converters at V H sinks driven by V L buffers  Minimize power subject to Arrival time requirement at the source (RAT) Slew rate constraint at buffer inputs and sinks

9 DVB Algorithm Based on [Lillis, JSSC 96]  Dynamic programming with partial solution (option) pruning  Options must now record downstream V dd levels for buffering To prevent V L => V H, which removes unnecessary search on solution space  Still quite slow for large nets Challenge  Considering power causes super-linear growth in the number of options (w.r.t. tree size)  Dual V dd buffers => 2x options at each node

10 Speed-up Technique Approximate by power-delay sampling Sampling under each distinct cap value  Uniformly pick options from the entire RAT — power trade-off curve

11 Experimental Settings for DVB Testcase: randomly generated Steiner trees  20 to 800 terminals in 1cm x 1cm routing area  Buffer sizes: 16x, 32x, 64x Sampling grid set to 20x20 Comparison  Exact power-optimal algorithm (PB) [Lillis, JSSC 96]  Our algorithm with single (SVB) and dual (DVB) V dd buffers

12 Sampling Preserves Optimality Sampling has little impact on optimality  SVB follows PB closely  Still optimal delay, 1.7% larger power over PB

13 Dual V dd Reduces Power Dual Vdd shifts power-delay curve to the left

14 Experimental Results for DVB DVB saves 23% power over SVB  More power saving in larger nets  Power saving becomes larger w/delay slack e.g. relax delay 5%, saving becomes 26% TestcasePower (at optimal RAT) (fJ) Net# nodes# sinksSVBDVB S53751991869913808 [-26%] S65152992344317239 [-26%] S77844993355223804 [-29%] S810546993835125799 [-33%] S911887994022826646 [-34%] avg[-23%]

15 Runtime SVB scales a lot better for larger testcases  Achieved 17x speedup over PB [Lillis, JSSC 96]  DVB takes ~2.5x more runtime than SVB TestcasesRuntime (s) net# nodes# sinksPBSVBDVB S537519971986212 S65152992121139371 S778449933419393635 S81054699> 1 day5981072 S91188799> 1 day8531859 avg1x1/17x1/7x

16 Outline Dual-V dd Buffer insertion and sizing (DVB)  Problem formulation  “ Sampling ” speed-up technique  Experimental results Dual-V dd buffered tree generation (D-Tree)  Problem formulation  Improved augmented orthogonal search tree  Experimental results

17 D-Tree Formulation Dual V dd Buffered Tree (D-Tree)  Given locations of terminals, buffer stations and blockages  Find a rectilinear Steiner tree (RST), buffer placement/size/V dd assignment V H buffers driving V L buffers only Level converters at V H sinks driven by V L buffers  Minimize power Arrival time requirement at the source (RAT) Slew rate constraint at buffer inputs and sinks D-Tree is NP-Hard  Finding minimum RST alone is NP-Complete

18 Buffered Tree Construction Delay optimization only [Cong, DAC 00] by 1. Build Hanan Graph w/buffer insertion nodes according to locations of buffer stations 2. Path search on the grid by option propagation

19 D-Tree Algorithm Overview Challenges  Growth of option is exponential An artifact of D-Tree ’ s NP-hardness  Considering power worsens option growth Solution: sampling + efficient prune tree

20 Prune Tree in [Lillis, JSSC 96] Option inserted in sorted capacitance  Never need to clear options out from the tree If new option is checked against the tree Automatically avoid redundant option in tree e.g. Ф new = (c = 20, p = 100, q = 600) Not applicable to D-Tree problem  Order of new options is not known a priori c=20, q=600 c=10, q=500 c=8, q=400c=15, q=550 c=12, q=520c=7, q=380 P=100

21 Our Improvement on Prune Tree Indexing w/capacitance results in fewer trees  # capacitance value < # power value Efficient “ tree cleaning ”  Enables out-of-order option insertion  Guarantee no redundancy in tree

22 Tree Cleaning To add an option Ф new in O(|c| · log(|T|)) time 1. Check whether Ф new is dominated by any option in the data-structure 2. If not, remove options in the tree dominated by Ф new in two downward tree traversals e.g. Ф new = (c = 10, p = 70, q = 410, … )

23 Experimental Settings for D-Tree Random testcases  All based on a random floorplan of 1cm x 1cm  Blockages ~ 30%, buffer stations ~1mm apart Comparison  Delay-optimal tree (RMP) [Cong, DAC 00]  Ours with single (S-Tree) and dual (D-Tree) V dd Buffer

24 Experimental Results for D-Tree Significant power saving over RMP  S-Tree: 7%, D-Tree: 18%  Larger saving for large testcases (e.g. T4) Handles up to 6-sink nets (T5 takes 23 mins)  Similar capability compared with delay-optimal approaches [Cong, DAC 00; Chen, ASP-DAC 02] TestcasesPower @ optimal RAT (pJ) Net# nodes# sinksRMPS-TreeD-Tree T313743.93.5 [-10%]2.9 [-23%] T426154.94.4 [-13%]3.1 [-37%] T523564.23.8 [-10%]3.4 [-18%] avg-7%-18%

25 Conclusion Formulated dual V dd buffer insertion/tree generation without level converters Proposed 2 speedup techniques  “ Sampling ” w/negligible loss of optimality  “ Improved prune tree ” for solution pruning Applied to single-Vdd buffer insertion, 17x faster than existing work Large power saving over single V dd buffering  23% in buffer insertion: dual V dd vs single V dd  18% in buffered tree: dual V dd vs delay optimal

26 Future Work Speed up tree construction Slack allocation for more power reduction  Path-based buffer insertion [Sze, DAC 05] Allocate slack along one interconnect path Consider single V dd buffers only  Chip level FPGA dual V dd assignment [Lin, DAC 05] Fixed buffer location, assign V dd levels Consider Multiple critical path Solved as a linear programming problem


Download ppt "Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of."

Similar presentations


Ads by Google