Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE260B – CSE241A Winter 2005 Power Distribution

Similar presentations


Presentation on theme: "ECE260B – CSE241A Winter 2005 Power Distribution"— Presentation transcript:

1 ECE260B – CSE241A Winter 2005 Power Distribution
Website:

2 Motivation Power supply noise is a serious issue in DSM design
Noise is getting worse as technology scales Noise margin decreases as supply voltage scales Power supply noise may slow down circuit performance Power supply noise may cause logic failures

3 Power = … Routing resources Pins Battery cost Performance
Vcc Vss Routing resources 20-40% of all metal tracks used by Vcc, Vss Increased power  denser power grid Pins Vcc or Vss pin carries 0.5-1W of power Pentium 4 uses 423 pins; 223 Vcc or Vss More pins  package more expensive (+ package development, motherboard redesign, …) Battery cost 1kg NiCad battery powers a Pentium 4 alone for less than 1 hour Performance High chip temperatures degrade circuit performance Large across-chip temperature variations induce clock skew High chip power limits use of high-performance circuits Power transients determine minimum power supply voltage

4 Pentium 4 die is about 1.5g and less than 1cm^3
Power = Package Pentium 4 die is about 1.5g and less than 1cm^3 Pentium-4 in package with interposer, heat sink, and fan can be 500g and 150cm^3 Integrated Heat Spreader Heat Sink Processor Processor Pins OLGA Pins Fan Decoupling Capacitors Interposer Package Pins Modern processor packaging is complex and adds significantly to product cost. Courtesy M. McDermott UT-Austin

5 Planning for Power Early simulation of major power dissipation components Early quantification of chip power Total chip power Maximum power density Total chip power fluctuations inherent & added fluctuations due to clock gating Early power distribution analysis (dc, ac, & multi-cycle) I.e., average, maximum, multi-cycle fluctuations Early allocation & coordination of chip resources Wiring tracks for power grid Low Vt devices Dynamic circuits Clock gating Placement and quantity of added decoupling capacitors

6 Power and Ground Routing
Floorplanning includes planning how the power, ground and clock should route Power supply distribution Tree: trunk must supply current to all branches Resistance must be very small since when a gate switches, its current flows through the supply lines If the resistance of supply lines is too large, voltage supplied to gates will drop, which can cause the gate to malfunction Usually, want at most 5-10% IR drop due to supply resistance  Usually on the top layers of metal, then distributed to lower wiring layers

7 Planar Power Distribution
Topology of VDD/VSS networks. Inter-digitated Design each macrocell such that all VDD and VSS terminals are on opposite sides. If floorplan places all macrocells with VDD on same side, then no crossing between VDD and VSS. cell VDD VSS cut line no cut line VDD B no connection VSS VDD VSS VDD C VSS A VDD VSS VSS VDD VDD VSS Courtesy K. Yang, UCLA

8 Gridded Power Distribution
With more metal layers, power is striped Connection between the stripes allows a power grid Minimizes series resistance Connection of lower layer layout/cells to the grid is through vias Note that planar supply routing is often still needed for a strong lower layer connection. There may not be sufficient area to make a strong connection in the middle of a design (connect better at periphery of die) Courtesy K. Yang, UCLA

9 Power Supply Drop/Noise
Supply noise = variations in power supply voltage that act as noise source for logic gates Power supply wiring resistance  voltage variations with current surges Current surges depend on dynamic behavior of circuit Solution approach Measure maximum current required by each block Redesign power/ground network to reduce resistance Worst case: move activity to another clock cycle to reduce peak current  scheduling problem Example: Drive 32-bit bus, total bus wire load = 2pF, with delay 0.5ns R for each transistor needs to be < 0.25kW to meet RC = 0.5ns Effective R of bits together is 250/32 = 7.5W For < 10% drop, power distribution R must be < 1W Courtesy K. Yang, UCLA

10 Electromigration Physical migration of metal atoms due to “electron wind” can eventually create a break in a wire MTTF (mean time to failure)  1/J2 where J= current density Current density must not exceed specification  wire Ii/wi < Jspec Specified as mA per m wire width (e.g., 1mA/ m) or mA per via cut EM occurs both in signal (AC=bidirectional) and power wires (DC = unidirectional) Much worse for DC than AC; DC occurs inside cells and in power buses May need more contacts on transistor sources and drains to meet EM limits Width of power buses must support both iR and EM requirements Issues in IR and EM constraint generation Topology is most likely not a tree How do we determine current patterns? Effects of R, L

11 What Happens? Example of an AlCu line seen under microscope.
Accelerated by higher temperature and high currents Voids form on grain boundaries Metal atoms move with current away from voids and collect at boundaries Catastrophic failure Courtesy K. Yang, UCLA

12 Courtesy S. Sapatnekar, UMinn
Taken from Taken from Sverre Sjøthun, “Electromigration In-Depth,” from Courtesy S. Sapatnekar, UMinn

13 Power Supply Rules of Thumb
Rules depend on technology Tech file has rules for resistance and electromigration Examples: Must have a contact for each 16l of transistor width (more is better) Wire must have less than 1mA/mm of width Power/Gnd width = Length of wire * Sum (all transistors connected to wire) / 3*106l (very approximate) For small designs, power supply design is non-issue Courtesy K. Yang, UCLA

14 Basic Methodology Concepts
Reliability (slotting, splitting) Alignment of hierarchical rings, stripes Isolation of analog power Styles of power distribution Rings and trunks Uniform grid Bottom-up grid generation Depends on: Package: flip-chip vs. wire-bond; I/O count (fewer pads  denser grid) Power budget IR drop limits Floorplan constraints (hard macros, etc.)

15 Metal Slotting vs. Splitting
Required by metal layout rules for uniform CMP (planarization) Split power wires Less data than traditional slotting More accurate R/C analysis of power mesh Not supported by all tools Easy connections through standard via arrays GND GND GND GND VS. M1 M1 Metal slotting was originally introduced into technology design rules to address reliability problems associated with long term thermal effects. Heating and cooling of wide power rails could cause metal lift or cracking. Slotting resolved this problem; however, locations of the slots required circuit knowledge because poorly positioned slots could cause current crowding conditions to arise and cause electromigration failures. Metal splitting is an alternative to metal slotting which is effective, can be automated more easily, requires less data, and allows for more accurate modeling for power analysis. Cadence supports metal splitting From the example, the connections to the wide stripe are easily made in the split case. In the slotting case, typically the slots are added in GDS because the p&r tools would have a difficult time 1) representing the data and 2) finding an appropriate via landing Difficult to connect - where should vias go? Courtesy Cadence Design Systems, Inc.

16 Trunks and Rings Methodology
Each Block has its own ring Rings may be inside the blocks or part of the top level Each Block has trunks connecting top level to block G V G V Rings may be shared with abutted blocks Individual trunks connecting blocks to top level V block 3 V block 5 G G block 2 V If rings are part of top level: Sharing power between blocks is much easier, no need to worry about overlapping structures Analysis much easier since main grid is visible all at the same time Splitting/Slotting at top level may make significant amounts of data If rings are part of the block: Easier to make connections from block level grid into ring – no alignment issues with followpins Creates spacing to alleviate many hierarchical extraction problems Splitting/Slotting at block level reduces top level data (confines to block) block 4 G V V block 1 G G V G V G V Courtesy Cadence Design Systems, Inc.

17 Courtesy Cadence Design Systems, Inc.
Trunks and Rings Advantages Power tailored to the demands of each block (flexible) More area efficient since the demands of each block are uniquely met Simple implementation supported by many tools Rings can be shared between blocks by abutted blocks Disadvantages Limited redundancy, power grid built to match needs Assumptions in design may change or be invalid Non regular structure requires more detailed IR drop/EM analysis missing vias/connections fatal Rings will require slotting/splitting due to wide widths Increase in data volume Routed power approach advantages requires only small portion of routing resources available on chip can optimize ring and stripe structure for each block Simple to implement No dependency to top level grid disadvantages very limited redundancy, since only few trunks carry power to blocks high quality power network difficult to design without careful analysis Courtesy Cadence Design Systems, Inc.

18 Uniform Chip Grid Methodology
Robust and redundant power network mainly in microprocessors and high end large ASICs Implementation Primary distribution through upper metal layers Lower layers in blocks to connect to top through via stacks Typically pushed into blocks Blocks typically abut Requires block grids to align Rows/Followpins should align with block pins Global buffer insertion global grid higher layers Fine or custom grid or no grid on lower layers G V G V V block 4 V block 5 G G block 3 V power grid equal-width metal spaced at constant pitch alternating horizontal and vertical power tracks form finely meshed grid coarse grid in upper layers fine grid near delivery points Or via stacks from followpins to global grid Make sure block pins match followpins/rows and that VDD and VSS line up to top grid (I.e. bottom row is either in normal or flipped orientation to match top level) block 4 G V block 1 V G G V G V G V Courtesy Cadence Design Systems, Inc.

19 Courtesy Cadence Design Systems, Inc.
Uniform Chip Grid Advantages Easily implemented Lends itself to straightforward hand calculations Path redundancy allows less sensitively to changes in current pattern Mesh of power/ground provides shielding (for capacitance) and current returns (for inductance) Top-down propagation easy to use on this style Disadvantages Takes up significant routing resources (20%-40% of all routing tracks if not already reserved for power/ground) Fine grids may slow down P&R tools Imposes grid structure into each block which may be unnecessary Top and blocks coupled closely if top level routing pushed into blocks Changes to block/top must be reflected in other Courtesy Cadence Design Systems, Inc.

20 Bottom-Up Grid Generation Methodology
Design and optimize power grid for block, merge at top Advantages Able to tailor grid for routing resource efficiency in each block Flexibility to choose the best grid for the block (i.e. ring and stripe, power plane, grid) Disadvantages Designing grid in context of the “big picture” is more difficult Block grid may present challenging connections to top level Assumptions for block grid’s connection to top level must be analyzed and validated Courtesy Cadence Design Systems, Inc.

21 Power Routing in Area-Based P&R
Power routing approaches (1) Pre-route parts of power grid during floorplanning (2) Build grid (except connections to standard cells) before P&R (3) Build entire grid before P&R N.B.: Area-based P&R tools respect pre-routes absolutely Cadence tools: power routing done inside SE, all other tasks (clock, place, route, scan, …) done by point tools Lab 5 tomorrow has a tiny bit of power routing (rings, stripes) Miscellany ECOs: What happens to rings and trunks if blocks change size? Layer choices: What is cost of skipping layers (to get from thick top-layer metal down to finer layers)? How wide should power wires be? Post-processing strategies Courtesy Cadence Design Systems, Inc.

22 Power Routing Wire Width Considerations
Slotting rules: Choose maximum width below slotting width Halation (width-dependent spacing) rules: Do as much as possible of power routing below wide wire width to save routing space Choose power routing widths carefully to avoid blocking extra tracks (and, use the space if blocking the track!) What is better power width here? Blocked tracks Courtesy Cadence Design Systems, Inc.

23 Power Routing Tool Usage
4 layer power grid example (HVHV) Turn on via stacking Route metal2 vertically Route metal4 vertically (use same coordinates) Route metal3 horizontally (make coincident with every N metal1 routes) Turn off via stacking Route metal1 horizontally metal2/metal4 coincident metal1 inside cells metal3 every n micron Courtesy Cadence Design Systems, Inc.

24 Post-Processing Flows (DEF or Layout Editing)
During PnR After post processing Courtesy Cadence Design Systems, Inc.

25 (Tree) Supply Network Design
Tree topology assumption not very useful in practice, but illustrates some basic ideas Assume R dominates, L and C negligible marginally permissible assumption Current drawn at various points in the tree (time-varying waveform) Current causes a V=IR drop “Ground” is not at 0V “Vdd” is not at intended level Supply = sinks Courtesy S. Sapatnekar, UMinn

26 Courtesy S. Sapatnekar, UMinn
IR Drop Constraints Chowdhury and Breuer, TCAD 7/88 Can write V drop to each sink as  Ri Ii < Vspec for all sink current patterns made available Tree structure: can compute Ii easily Ri   li / wi Change wi to reduce IR drop Objective: minimize  ai wi Current density must never exceed a specification For each wire, Ii/wi < Jspec Supply Courtesy S. Sapatnekar, UMinn

27 P/G Mesh Optimization (R only)
Dutta and Marek-Sadowska, DAC 89 Cost function:  ai li wi =  ai cili2 // = total wire area (since ci = conductance = wi/( li) Constraints EM: Ii  e wi // current density I/w less than upper bound Substitute Ii = vi (wi/  li) // I = V/R  vp - vq  e  li // divide by wi, *  li Wire width constraints: Wmin  wi  Wmax (translate to ci) Voltage drop constraints: va - vb  Vspec1 and/or vi  Vspec2 Circuit equations that determine the v’s Variables: ci’s (vi’s depend on ci’s) Courtesy S. Sapatnekar, UMinn

28 Courtesy S. Sapatnekar, UMinn
Solution Technique Method of feasible directions Find an initial feasible solution (satisfies all constraints) Choose a direction that maintains feasibility Make a move in that direction to reduce cost function Given a set of ci’s, must find corresponding vi’s Feasible direction method: move from point c* to c+ c* and c+ must be close to each other (i.e., if you have the solution at c*, the solution at c+ corresponds to a minor change in conductances) Solving for vi’s : solving a system of linear equations Solution at c* is a good guess for the solution at c+ Converges in a few iterations Courtesy S. Sapatnekar, UMinn

29 Modeling Gate Currents
Currents in supply grid caused by charging/discharging of capacitances by logic gates All analyses require generation of a “worst-case switching” scenario Enumeration is infeasible  Two basic approaches Simulation based methods: designer supplies “hot” vectors, or we try to generate these hot vectors automatically “Pattern-independent” methods: try to estimate the worst-case (can be expensive, very inaccurate) Once current patterns are available, apply them to supply network to find out if constraints are satisfied Courtesy S. Sapatnekar, UMinn

30 Complexity of Hot Vector Generation
Devadas et al., TCAD 3/92: Assume zero gate delays for simplicity Find the maximum current drawn by a block of gates Using a current model for each gate Find a set of input patterns so that the total current is maximized Boolean assignment problem: equivalent to Weighted Max-Satisfiability Given a Boolean formula in conjunctive normal form (product of sums), is there an assignment of truth values to the variables such that the formula evaluates to True? Checking for Satisfiability (for k-sat, k > 2) is NP-complete  Difficult even under zero gate delay assumption Courtesy S. Sapatnekar, UMinn

31 Pattern-Independent Methods
iMAX approach: Kriplani et al., TCAD 8/95 Current model for a single gate Gates switch at different times Total current drawn from Vdd (ignoring supply network C) is the sum of these time-shifted waveforms Objective: find the worst-case waveform Ipeak  Delay Courtesy S. Sapatnekar, UMinn

32 Courtesy S. Sapatnekar, UMinn
Example (Not to scale!) Maximum current not just a sum of individual maximum currents Temporal dependencies [Using deliberate clock skews can reduce the peak current, as we saw in the Useful-Skew discussion] Courtesy S. Sapatnekar, UMinn

33 Maximum Envelope Current (MEC)
Find the time interval during which a gate may switch Manufacturing process variations can cause changes Actual switching event can cause changes Switching at second gate can occur at t=1 or at t=2 In general, a large number of paths can go through a gate; assume (conservatively) that switching occurs in t  [1,2] Assume that all gate inputs can switch independently – provides an upper bound on the switching current (unit gate delays) Courtesy S. Sapatnekar, UMinn

34 (Large) Errors in MEC Approach
Correlation Problem Switching at G0, G1, G2 and G3 not independent G0 = 0 implies that G1, G2, G3 switch; G0 = 1 means that other inputs will determine gate activity If the other inputs cannot make the gate switch in the same time window, then iMAX estimates are pessimistic Reconvergent Fanout Problem Signals that diverge at G0 reconverge at Gk  inputs to Gk are not independent Assumption of independent switching is not valid Many heuristic refinements proposed, but guardbanding (error) of power estimation still a huge issue G0 G1 G2 Gk G0 G3 Courtesy S. Sapatnekar, UMinn

35 Outline Motivation Power Supply Noise Estimation
Decoupling Capacitance (decap) Budget Allocation of Decoupling Capacitance Experiment Results Conclusion

36 Why Decoupling Capacitance
Frequency point of view Decaps form low-pass filters They cancel anti- effects Physical point of view Decaps serve as charge reservoirs They shortcut supply current paths and reduces voltage drop No effect to DC supply currents

37 Power Supply Network—RLC Mesh
:Current Source VDD Rp Lp : VDD pin VDD VDD VDD Slide courtesy of S Zhao, K Roy & C.-K. Kok

38 Current Distribution in Power Supply Mesh Illustration
:Connection point, Current contribution Current flowing path VDD (1) :VDD pin (3) (5) VDD (2) (6) Module A B C Slide courtesy of S Zhao, K Roy & C.-K. Kok

39 Current Distribution in Power Supply Network
Distribute switching current for each module in the power supply mesh Observation: Currents tend to flow along the least- impedance paths Approximation: Consider only those paths with minimal impedance --shortest, second shortest, … Slide courtesy of S Zhao, K Roy & C.-K. Kok

40 Current Flowing Paths and Power Supply Noise Calculation
Power supply noise at a target module is the voltage difference between the VDD pin and the module Apply KVL: C2 2(t) R1 L1 i 1(t) 3(t) VDD R2 L2 k C1 i Slide courtesy of S Zhao, K Roy & C.-K. Kok

41 Why Decoupling Capacitance?
VDD R2 L2 k R1 L1 C1 i 1(t) C2 i 2(t) P/G network wiresizing won’t change voltage drop frequency spectrum To reduce Vdrop by k times needs to size up wires by k times along the supply current path Decoupling caps act as a low-pass filter Efficient to remove high frequency elements of Vdrop

42 Decoupling Capacitance Budget
Decap budget for each module can be determined based on its noise level Initial budget can be estimated as follows: Iterations are performed if necessary until noise at each module in the floorplan is kept under certain limit Slide courtesy of S Zhao, K Roy & C.-K. Kok

43 Allocation of Decoupling Capacitance
Decap needs to be placed in the vicinity of each target module Decap requires WS to manufacture on Use MOS capacitors Decap allocation is reduced to WS allocation Two-phase approach: Allocate the existing WS in the floorplan Insert additional WS into the floorplan if required Slide courtesy of S Zhao, K Roy & C.-K. Kok

44 Allocation of Existing White Space
WS B D w2 C w1 E w3 Slide courtesy of S Zhao, K Roy & C.-K. Kok

45 Allocation of Existing WS--Linear Programming (LP) Approach
Objective: Maximize the utilization of available WS Existing WS can be allocated to neighboring modules using LP Notation: LP Approach: Slide courtesy of S Zhao, K Roy & C.-K. Kok

46 Insert Additional WS into Floorplan If Necessary
Update decap budget for each module after existing WS has been allocated If additional WS if required, insert WS into floorplan by extending it horizontally and vertically Two-phase procedure: insert WS band between rows based the decap budgets of the modules in the row insert WS band between columns based on the decap budgets of the modules in the column Slide courtesy of S Zhao, K Roy & C.-K. Kok

47 Moving Modules to Insert WS
Slide courtesy of S Zhao, K Roy & C.-K. Kok

48 Experimental Results Comparison of Decap Budgets (Ours vs “Greedy Solution”)

49 Experimental Results for MCNC Benchmark Circuits

50 Floorplan of playout Before/After WS Insertion

51 Conclusion A methodology for decoupling capacitance allocation at floorplan level is proposed Linear programming technique is used to allocate existing WS to maximize its utilization A heuristic is proposed for additional WS insertion Compared with “Greedy” solution, our method produces significantly smaller decap budgets

52 Thank you


Download ppt "ECE260B – CSE241A Winter 2005 Power Distribution"

Similar presentations


Ads by Google