Download presentation
Presentation is loading. Please wait.
1
Directions in Low-Power CAD Dennis Sylvester University of Michigan dennis@eecs.umich.edu http://vlsida.eecs.umich.edu With acknowledgements to: Prof. David Blaauw, Dr. Sarvesh Kulkarni, Saumil Shah, Kavi Chopra
2
Topics A new dual-Vth assignment formulation Dual-Vdd power distribution Approaches to parametric yield optimization: statistical leakage + delay
3
Motivation We require high-performance yet low-power circuits Leakage power contributes significantly to total power All High- V th implementation too slow All Low-V th implementation too leaky Dual- V th processes popular Problem Definition Minimize Total Circuit Power Subject to Circuit Delay Constraint Sizing Constraints Optimization Variables Gate Sizes Gate Threshold Voltages Switching Subthreshold leakage S. Narendra et al [ICCAD ’03]
4
Gate Sizing + V th Assignment Problem Prior Work Traditionally a discrete problem Previous approaches Separate Sizing and V th Assignment Mixed Integer Non-Linear Programming Sensitivity-based methods (DUET, etc) Continuous formulation [Chen, ASP-DAC ‘05] Very reliant on discretization heuristic
5
Proposed Approach – Self- snapping formulation Continuous formulation – Use of large variety of algorithms/powerful non-linear optimizers possible Solution has almost all gates assigned to one of the two available threshold voltages Small fraction of gates with intermediate V th ’s, can be handled heuristically Discretization algorithm has negligible power impact and can be very simple
6
Proposed Approach – Mixed- V th Gates Consider each gate to be a parallel combination of high and low V th gates RC Delay Model HVt Gate LVt Gate Mixed Gate Linear Power Model HVtLVt
7
Complete Dual- V th Problem Formulation Similar to single-V th gate sizing problem, with simple gate delays replaced with High V th /Low V th parallel combinations Minimize Subject to:
8
Proof of Discretized Solution Conceptually separate optimization process into two distinct phases: D-Phase : Fix delays of all gates W-Phase : Find the minimum-power sizing solution that satisfies the chosen D vector Hypothetical separation for proof – Not implemented in actual optimization procedure
9
W-Phase Proof of discrete optimal solution under arbitrary D-vector sufficient W-Phase formulation Minimize Subject to:
10
W-Phase Linear programming problem n basic variables, n non-basic variables Therefore, only n non-zero variables Every gate snapped to either high-V th or low-V th Addition of upper and lower bounds on total size leads to some non-snapped gates Number extremely small – simple heuristic achieves good results
11
Practical Constraint – Fixed-Width Input Drivers Sequential elements driving the combinational circuit Delay of these elements affected by primary input widths Modeled as fixed-width drivers
12
Extension of Discretization Analysis m+n constraints in the optimization problem n+m basic variables, n-m non-basic variables Therefore, n+m positive variables Total number of non-snapped gates bounded by number of inputs Once again, small in number; can be handled heuristically In practice, number of non-snapped gates found to be much less than the number of inputs
13
Discretization Heuristics Iterative snapping Round gates to closer V th and re-optimize until non- snapped solution achieved Single-pass V th assignment Fix all gates to closer V th and re-optimize only for gate sizes Second heuristic faster with negligible power impact
14
Results Snapping properties of some circuits # of non-snapped gates is very small Dominated by gates at upper and lower size bounds Approach is easily extendable to multi-Vth AND multi- Lgate
15
Results Power and runtime comparisons between proposed approach and sensitivity-based algorithm at 2% timing backoff (results shown for larger circuits only) Average: 31% leakage reduction vs. previous approaches Ckt SBA Continuous Formulation % Improvement Runtime(s) StaticDyn.StaticDyn.TotalStaticTotalSBACont C35400.260.740.160.780.9438.146.462851 C53150.220.780.150.800.9530.535.1152133 C62880.350.650.260.650.9124.699.04136443 C75520.310.690.240.680.9123.938.8794171 i80.240.760.190.750.9421.575.872435 i90.200.800.160.770.9417.656.47921 i100.310.690.230.690.9224.87.69287373
16
Topics A new dual-Vth assignment formulation Dual-Vdd power distribution Approaches to parametric yield optimization: statistical leakage + delay
17
Multiple supply design Relies on applying a lower supply (VDDL) to gates along non-critical paths thus reducing power while meeting timing A flexible fine-grained VDD assignment scheme promises best power reduction Gate-level Extended Clustered Voltage Scaling However, physical design and power delivery are complicated VDDH IN VDDL VDDL Swing Need for Level Conversion DC Current FF
18
Implications of using multiple supplies Coupled issues Algorithms VDD assignment Circuits Level shifting Physical design VDD Granularity Power delivery Distribution Generation Critical Non-critical CVS ECVS Fine-grained Islanding IN OUT
19
Power delivery for dual-VDD circuits Power grid integrity vital for circuit performance Dual-VDD circuits require two supply voltages for operation Fine-grained dual-VDD can place VDDL/VDDH gates arbitrarily on the die Implications at the board, package and die level Fixed resources need to be split between VDDL and VDDH However, load on each supply is lower than on original single supply: Power supply current demanded by a dual-VDD circuit is significantly lower than the corresponding single-VDD circuit, allowing robust power delivery within available resources (decap, C4, wiring)
20
Reduced current load on VDDL/VDDH Gate level comparison Avg. 54% (33%) for VDDL = 0.8V (0.6V) Circuit level comparison Avg. 49% (51%) and 28% (14%) for VDDH and VDDL for 0.8V (0.6V) VDD ECVS
21
Package level results Two VRMs on board to supply VDDL and VDDH Ground path can be shared by VDDL and VDDH Decoupling capacitance divided in the ratio of current loads Similar power supply noise with same resources as single-VDD case (decoupling capacitance, C4s) Intel, “Intel Pentium 4 processor in the 432 pin/Intel 850 Chipset Platform,” 2002.
22
Dual-VDD physical design alternatives Segregated placement constrains placer leading to higher core area and wirelength Single-VDDDual-VDD Dual-VDD segregated VDDH + VDDL row VDDH VDDL GND Dual-VDD fine-grained C. Yeh, et al., “Layout techniques supporting the use of dual supply voltages for cell-based designs,” Proc. DAC, 1999. M. Igarashi, et al., “A low-power design method using multiple supply voltages,” Proc. ISLPED, 1997.
23
Dual-VDD power grid alternatives Routing the power supply rails Dual-VDD Dual-GND requires two separate grounds off-chip and complicates timing analysis and design of the board itself Multi-rail standard cells can be used to realize the Dual-VDD grids allows placer to operate with no constraints VDD GND VDDH GNDH VDDL GNDL VDDH VDDL GND (shared) Single-VDD Dual-VDD Shared-GND Dual-VDD Dual-GND VDDH GNDH VDDL GNDL 4-rail cell 3-rail cell VDDH VDDL GND (shared) Dual-VDD standard cells topologies
24
Dual-VDD on-chip power grid design Guidelines while designing the dual-VDD grid: Scale wires with respect to the single-VDD considering how the current demand has scaled VDDL gates more sensitive to grid noise important since ground is shared 120mV noise is 10% for a 1.2V gate, but 20% for a 0.6V gate Placement of VDDL and VDDH gates assign more wiring resources to VDDL grid in areas where there is more demand for VDDL current Consider effects that arise from the board and package level such as shared C4s Fewer C4s leads to higher effective package R, L
25
Proposed technique D-Place Let = I(VDDH)/I(VDD) and = I(VDDL)/I(VDD) Scale wires as follows VDDH VDDL GND Global Regional Local Partition the chip floorplan Obtain eff. and as follows Obtain current consumption of Single/Dual VDD designs (SPICE) Calculate local, regional, global & effective & for each wire segment Size each wire segment in each local area using effective , β & simulate grid Measure voltage droop/bounce Measure wire congestion Break down die into “local” & “regional” areas Placement database (Cadence) Original Single VDD design Single VDD Lib file Obtain Dual VDD design Dual VDD Lib file
26
Peak voltage drop comparisons D-Place grids better than single-VDD grids in AVG cases Inferior by < 2.6% (≈15mV) in some MAX cases 0.6V VDDL as robust as 0.8V 0.6V also provides higher power savings Proposed approach better by 2-7% (AVG) and 7-12% (MAX) compared to prior approaches VDDL = 0.6V VDDL = 0.8V
27
Voltage variation across die Voltage drop contours Wiring congestion similar for dual-Vdd vs. single Vdd grids Lower current demands can lead to smaller amounts of decoupling cap; lower leakage (or use same decap for better performance) Dual-VDD grid no less robust than single-VDD grid
28
Topics A new dual-Vth assignment formulation Dual-Vdd power distribution Approaches to parametric yield optimization: statistical leakage + delay
29
Introduction Optical Proximity Effects Variation Chemical Mechanical Polishing Variations L eff V th P Process Parameter-space Good Timing High Leakage Power Yield Loss Low Leakage PoorTiming Timing Yield Loss Power Delay P Chip Performance-space This Work: Optimize the timing and power yield using gate sizing
30
P const T const Problem Description Nonlinear Continuous Optimization Objective: Maximize Timing and Power Yield Yield: A utility function defined w.r.t the JPDF of leakage and timing Decision Variables: Gate Size Efficient implementation requires Computing yield as function of decision variables - gate size Fast and Accurate Gradient computation
31
Power and Timing Yield Analysis (see DAC05 for more detail) Timing Analysis [Sapatnekar03, Chandu05] ( d, d ) Delay Log(Leakage) Power Analysis ( l, l ) Correlation (1 parameter) Delay and Power Bivariate JPDF ( d, d, l, l, ) dd ll
32
Traditional Incremental Timing Size Up 7 Cut Set SSTA: Intuition 1 3 4 5 6 7 9 8 10 2 Consider Timing Graph Unperturbed Sub Graph Unperturbed Left Sub Graph Unperturbed Right Sub Graph Arrival Time (AT) Required Arrival Time (RT) Max Cut Edge Time (CT) Cut Edge Time(CT) If Forward SSTA Reverse SSTA then Cut Set SSTA will give exact same sensitivities as naïve approach that recomputes yield relating to all nodes, most being unchanged
33
Statistical Yield Optimization Results D < Dμ,initial, P < Pμ,initial CircuitYield without L (%)Yield with L (%) c43245.480.2 c49939.259.0 c88049.383.2 c190847.982.8 c267051.185.3 c354051.287.1 c531550.087.3 c628850.386.5 c755251.280.8 Initial yield ~0-2% due to inverse correlation Gate sizing alone provides good improvements Combined with Lgate biasing, provides outstanding results Chopra, et al., ICCAD05
34
Another approach to statistical optimization General statistical optimization Method relies on efficient deterministic formulations and variation space sampling to drive statistical optimization Applicable to many mainstream VLSI design problems: gate sizing, Vth assignment, Leff biasing as well as potential new levers
35
Statistically Optimized Body Bias Clustering for Post-Silicon Tuning Concept: Speed up critical gates using FBB and slow down non-critical gates using RBB to meet timing and power constraints Traditional view: Centralized body bias generator controlling different die regions Ineffective for compensating intra-die variations Highly suboptimal power Critical Non-critical BB controller
36
Coarse Body Bias Assignment Simplified assignment minimizing routing overheads Biasing dictated by placement instead of gate criticality Disregards complex dependence of gate criticality on: Circuit topology Correlations in process variations Effective in tightening delay but leads to high power ONE BIAS FOR ALL GATES DELAY POWER Critical Correlated Important to cluster gates to leverage ABB effectively
37
Proposed New Optimization Framework 4 5 1 2 3 7 6 Leff_4.1 Scenario ‘1’ Leff_5.1 Leff_1.1 Leff_3.1 Leff_2.1 Leff_6.1 Leff_7.1 Generate sample scenarios 4 5 1 2 3 7 6 Leff_4.2 Scenario ‘2’ Leff_5.2 Leff_1.2 Leff_3.2 Leff_2.2 Leff_6.2 Leff_7.2 4 5 1 2 3 7 6 Leff_4.x Scenario ‘x’ Leff_5.x Leff_1.x Leff_3.x Leff_2.x Leff_6.x Leff_7.x Generate PDFs of optimal actions Clustering BB-PDF ρ i,j Gate DETERMINISTICALLY optimize each scenario (i.e., tune each gate for each die scenario) Solve BB assignment for each scenario Post-silicon tuning
38
Results vs. Traditional Dual-Vth Leakage power Dual-Vth vs. 2-4 ABB clusters Avg. 28-38% (51-59%) lower μ (95 th ) Area Capo generates contiguous regions of similarly clustered cells while minimally displacing cells 5-8% increase in wirelength and area Delay 3-9X tighter σ
39
A few conclusions Parametric yield is a critical design objective going forward Requires accurate estimation and fast optimization approaches to this key metric Envision all tools in 4-6 years being yield-driven, rather than timing or power alone Lots of room for improvement in many ‘ well-studied ’ CAD problems today Recent examples; dual-Vth+sizing, placement (Cong, et al)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.