Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Modeling and Optimization of VLSI Interconnect 049031 Lecture 5: Single Net Optimization Avinoam Kolodny Konstantin Moiseev.

Similar presentations


Presentation on theme: "1 Modeling and Optimization of VLSI Interconnect 049031 Lecture 5: Single Net Optimization Avinoam Kolodny Konstantin Moiseev."— Presentation transcript:

1 1 Modeling and Optimization of VLSI Interconnect 049031 Lecture 5: Single Net Optimization Avinoam Kolodny Konstantin Moiseev

2 2 Standard VLSI Design Flow Layout Synthesis Verify Layout Manual Layout Design Develop RTL Model RTL Validation Logic Synthesis Manual Circuit Design Verify Schematics RTL Schematics Layout System Spec.  Architecture  Logic  Circuit  Layout

3 3 Are interconnect delays dominant?  Integrated circuit wires have resistance r per unit length and capacitance c per unit length. What is the wire delay for length L?  Wire delay grows as wirelength squared: ~ rcL 2 !  A total stage delay is:  Rout ( Cout+Cwire+Cload)+ Rwire (½Cwire+Cload)  Wire delays are significant and cannot be ignored in synthesis… but layout information is not readily available to the synthesizer…..  Design iterations are typically used 1 2 Load Rout Cout Rwire CwireCload Rwire

4 4 Timing Convergence Problem RC extract Manual Layout Design Logic Synthesis Manual Circuit Design Timing analysis Schematics (netlist) Layout Layout Synthesis Interconnect-based timing analysis is in a feedback-loop Synthesis must predict the delays somehow Wireload models are used for pre-layout RC estimation With good prediction, the loop converges quickly The problem in nanoscale circuits: wiring delays are large Inaccurate estimates Iteration may not converge It is necessary to do interconnect optimization! feedback

5 5 Net-by-Net vs. Multi-Net Optimization  In Net-by-Net optimization of a signal wire, make simple assumptions about neighbor wires, e.g. :  Neighbors are far away  … are grounded  …. Are unaffected by changes we make to the wire under optimization

6 6 6 Perspective on Models in Speed Optimization of Wires  Evolution:  1) “Ideal” Interconnect (R=0, C=0, L=0)  2) Capacitive interconnect (C  0)  3) Resistive interconect (C  0, R  0)  4) Inductive interconnect (R  0, C  0, L  0) Technology Evolution C int R int C int R int

7 7 Optimization Scenarios (from Point A to Point B) Wire line Logic path Tree-Net

8 8 8 1) Speed optimization of a single stage with a resistive wire line (Bakoglu; IEEE Trans. ED-32, 1985)  The problem (Bakoglu ’85): With technology scaling, Interconnect resistance becomes dominant.  Solutions? C int R int R tr CLCL +-+- P T T T

9 9 9  Delay of resistive line with k inverters  Find optimal k corresponding to :  Why is this optimal?  Does it depend on inverter size?  When to insert the first buffer (L crit ) ? Repeaters: Bakoglu’s classical derivation C line /k 21 R line /k R inv C line /k 3 R line /k R inv C line /k k R line /k R inv

10 10  Size up by a factor x: repeater parameters become R inv /x C inv *x (In Bakoglu’s book this factor is called h)  Find optimal x by requiring  This optimal delay is linear with line length L  The delay of each segment is the same for all interconnect layers!  Tradeoff with area and power: use smaller x? Upsizing the repeaters

11 11 Better problem formulations  Minimum area or power under a “delay < Tmax” constraint  Minimize a weighted cost function  Example: min [ (Area) * (delay) 3 ]  50% bakoglu’s size Source: Sylvester & Keutzer, TCAD 2000Source: Cao et. al., ICCAD 2000

12 12 Boosters vs. Repeaters (Nalamalpu, IEEE TCAD v.21 p.50 2002: Dobbelaere 95)  “Regeneration stations”: no “self delay”, bidirectional wires ok  But: power hungry, noise issues, process sensitivity Vi t

13 13 Repeater Insertion in RLC Lines  Ismail and Friedman, DAC 1999. R t /k L t /k C t /k l/k h R t /k L t /k C t /k l/k h R t /k L t /k C t /k l/k h h 1 2 k  Optimum number of sections decreases as inductance effects increase  This behavior is due to the deviation from the quadratic dependence of the delay with interconnect length

14 14 Driver Upsizing  D. Sylvester and K. Keutzer, “Getting to the bottom of deep submicron”, Proc. ICCAD, pp.203-211,1998 C int R int R tr CLCL +-+-

15 15 Wire Sizing, Wire Tapering  Chen, C. P., Chen, Y. P., & Wong, D. F. (1996, June). Optimal wire-sizing formula under the Elmore delay model. In Proceedings of the 33rd annual Design Automation Conference (pp. 487-490).  Cong, J., & Leung, K. S. (1993, November). Optimal wiresizing under the distributed Elmore delay model. In Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design (pp. 634-639). IEEE Computer Society Press.  Chen, Chung-Ping, and D. F. Wang. "A fast algorithm for optimal wire-sizing under Elmore delay model." Circuits and Systems, 1996. ISCAS'96., Connecting the World., 1996 IEEE International Symposium on. Vol. 4. IEEE, 1996.

16 16 2) Speed optimization of a logic path with ideal wires (Logical effort method: first optimization) Ideal = ignorable Delays depend only on gates! d= p= h= g= Minimal delay when g*h is the same for every gate on the path CMOS gate1 CMOS gate2 d1d1 d2d2 2345678910 Inverter Nand2 Nor2 Source:: “Logical Effort”. Sutherland, Sproull and Harris,1999

17 17 Speed optimization with ideal wires (Continued) (Logical effort method: second optimization) cascaded buffers (a series of amplifying inverters) to drive large C load Device widths grow by a fixed factor f to form an exponential “horn” Neglecting p, the factor should be e (about 2.7) Including p (and additional parasitic capacitance) the factor is about 4 CMOS gate1 CMOS gate2 Given fixed gates

18 18 Merging the two optimizations (Interconnect is still ideal!) Merging the 2 optimizations: given a logic path with some C load and some allowed C in, add inverters to optimize the number of stages, and choose sizes such that g*h is equal Gate sizes will still form a “horn”, but those with higher g require lower h C load Allowed C in

19 19 Practical notes  “Flat” functions, area tradeoff  Tolerance in optimization  Large area saving for a small price in delay, by increasing f and using fewer stages  Power tradeoff:  Each stage adds more capacitance  Rush-through current at each stage Note: Low-power design considerations lead to higher f f d X=Cload/Cin

20 20 Power Reduction by Downsizing  Logical Effort provides gate sizing for optimal speed  Downsizing the gates can save power! (lower speed)  Capacitance of the path: C=C 1 + C 2 + C 3 + C 4 + C out  In order to reduce energy by 10%, we can reduce the path capacitance by 10%  The main question: How to do it properly?  Remove all capacitance from latest stage? Remove 10% of each stage? How to downsize the gates, to gain maximum power savings, and minimum effect on timing? * Y. Aizik and A. Kolodny, "Finding the Energy Efficient Curve: Gate Sizing for Minimum Power under Delay Constraints," VLSI Design, 2011.

21 21 Energy Efficient Curve delay energy Energy efficient design - highest performance among all possible configurations dissipating the same power Energy efficient curve – collection of all energy efficient designs plotted in the energy-delay space

22 22 Typical Design Scenario  Initial circuit is close to the minimum delay point  Power is too high  Sync with the energy efficient curve  Relax delay in an optimal manner  stay on the energy efficient curve!  For each delay point, get the optimal sizing of the circuit that maximizes the energy reduction delay  Need to quantify the energy performance tradeoff energy

23 23 Problem Formulation  For a given circuit, with initial gate capacitances {C 01, C 02 … C 0N }  Output load C out,  Input activity factors of each gate AF i j,  Leakage power coefficient for each gate  Required delay increase rate d inc,  Find {k 1 …k N } that maximizes the energy reduction rate Pleak 1 Pleak 2 Pleak 3 Pleak 4

24 24 Analytical Model  Use logical effort method  to model the delay of the circuit  extend it to model the energy dissipation of a given circuit  Advantages: easy to understand, known method, many available tools logical effort of gate i parasitic delay of gate i load capacitance dynamic capacitance, energy / Vcc ² electrical effort path parasitic delay downsize factor for gate i input activity factor for gate i input capacitance for gate i that achieves initial delay

25 25 Finding the Energy Efficient Curve  The optimization problem:  For a given circuit,  given a delay requirement that is d inc percent greater than the initial delay,  find the best sizing of logic gates in the path (a vector of downsizing factors k i )  that maximizes the energy reduction rate e dec  However, f 1 defined above is non-convex. We use geometrical programming to solve the optimization problem.

26 26 Example Circuits and Results  The result of this work is a tool written over Matlab, and uses a convex optimizer (GGPLAB) as an infrastructure.  As an example, ran numerical experiments that explore the EDG of several circuits  Results were verified against Intel propriety circuit simulation and optimization tool

27 27 Inverter Chain – EDG  Maximum energy per delay is achieved near the minimum delay point  The potential for energy savings decreases as the delay is being relaxed further For 2.5% delay increase, get 12x2.5=30% energy reduction! For 20% delay increase, get 2x20=40% energy reduction

28 28 3) Speed optimization of a logic path with capacitive interconnect “Equal division of effort” is not optimal any more! Approximation for long wires: (Cint>>Cin_of_gate2) Drive Cint as if gate2 wasn’t there; Increase gate2 as necessary to speed up the rest of the path (as long as Cint remains dominant). Approximation for short wires: (Cint<<Cin_of_gate2) Assume C int is proportional to the size of the driving gate. This means a larger (but constant) p* in the equation. This re-justifies equal g*h! Intermediate wires: None of these approximations hold… optimize numerically or by trial-and-error simulations CMOS gate1 CMOS gate2 d1d1 d2d2 C int

29 29 A heuristic for cascaded buffers with local interconnect capacitance (Cherkauer & Friedman, SC-30, 1995)  The principle: Keep equal rise/fall slopes at all nodes  This means: constant current-to-capacitance ratio  This is a healthy guideline also to control rush-through current and hot-electron effects

30 30 The Interconnect Wall Logic w/o wires Long wires Logic Gate Sizing Logical Effort Interconnect Optimization Repeater Insertion

31 31 4) What can we do in a logic path with resistive and capacitive interconnect? 1 2 3 5 4

32 32 “Interconnect Effort” (Srinivasaraghavan & Burleson, IEEE VLSI Symp. 2003)  Eliminate buffer-cascade by logic sizing  Non-uniform repeater distances  Save some delay and power Venkat, Kumar. "Generalized delay optimization of resistive interconnections through an extension of logical effort." Circuits and Systems, 1993., ISCAS'93, 1993 IEEE International Symposium on. IEEE, 1993.

33 33 “Logic gates as repeaters” (Venkat 93, Amrutur&Horowitz 2001, Morgenshtein et al. 2003 )  The idea: distribute the logic gates over the line distance  No additional devices: save power and area  Non-uniform resistive wire segments, depending on gate efforts and sizes  Find optimal wire segmenting, optimal gate sizing  Combine with repeater insertion

34 34 Unified Logical Effort – ULE (Include wires in Logical Effort optimization) “What is the optimal size of the gates?” A. Morgenshtein, E.G. Friedman, R. Ginosar, A. Kolodny, "Unified Logical Effort - A Method for Delay Evaluation and Minimization in Logic Paths with RC Interconnect", to be published in IEEE Transactions on VLSI. A. Morgenshtein, E.G. Friedman, R. Ginosar, A. Kolodny, "Timing Optimization in Logic with Interconnect," Invited Paper, ACM International Workshop on System Level Interconnect Prediction (SLIP), UK, pp. 19-26, April 2008.

35 35 Extension of the Logical Effort Model to Logic Gates with Wires capacitive interconnect effort resistive interconnect effort

36 36 ULE Optimum repeater insertion repeater scaling special cases logic without wires Logical Effort Minimal Delay Equal Stage Delays

37 37 Intuition of ULE Optimum optimal size = Delay caused by gate capacitance should be equal to delay caused by gate resistance

38 38 Optimal Gate Sizes Capacitive interconnect (short wires and branches) General RC interconnect (long wires)

39 39 Examples (1) Optimal ULE sizing (normalized with respect to C0) in circuit E1 for several lengths of the wire at each stage. For zero wire length, the solution converges to LE optimization. For long wires, the solution converges to equal sizing x opt

40 40 ULE for Branches and Fanout General ULE condition for gate sizing Timing Optimization in Logic with Interconnect  ULE

41 41 5) Driving RC trees (rather than lines)  Each buffer forks-out a subtree, and its input capacitance loads the upstream path  Problem: too many options (building and buffering is NP complete)

42 42 Van Ginneken’s buffer insertion algorithm (ISCA 1990, pp. 865-868)  Existing global wiring tree  “Legal positions” - Candidate buffer locations  Assume they are at all branching points  Bottom up generation of options (c,q)  Pruning property: delete (c’,q’) if c’>c and q’<q  Because a larger load can only worsen upstream delays  “dynamic programming optimality principle”  Merging of options at point p: C i +C r, min(q i,q r )  Merging and pruning yields a linear list of viable options, instead of exponential!  Improvements :  Combine with wire sizing, inverting/noninverting buffers, multiple buffer sizes, slew-rate dependent delay model, minimize area/power under delay constraint c = connected capacitance q = required time (latest is best)

43 43 Wire segmenting (Alpert & Devgan, DAC 1997, Adler & Friedman TCAS-I 2000)  Van Ginneken assumed one buffer per wire in the tree  Alpert & Devgan added a wire-segmenting preprocessing step to the algorithm  Adler & Friedman use a different approach with similar results

44 44 Summary  We covered delay optimization for:  Single stage with resistive wire (driver sizing, wire tapering, buffer insertion)  Logic path with ideal wires (Logical Effort)  Logic path with resistive/capacitive wires (Unified Logical Effort)  Tree-structured Nets (buffer insertion and wire segmenting)  This is actually part of a complex problem! Logic gate sizing + Interconnect topology generation (fanout tree, routing) + Wire sizing (and layer assignment) + Buffer insertion and sizing + Delay / area / power / noise optimization

45 45 Miller Coupling Factor (MCF)

46 46 Decoupling cross-coupled nets for Net-by-Net optimization: Switch factor (Miller factor) Cx 2Cx

47 47 SF can be between –1 and 3 if slopes are significantly different kahng et al., DAC 2000, Chen et al. ICCAD 2000  This is a consequence of defining delay by 50% point on waveform  SF=-1 for in-phase transition  Sf=3 for out-of-phase (“unfriendly”) transition


Download ppt "1 Modeling and Optimization of VLSI Interconnect 049031 Lecture 5: Single Net Optimization Avinoam Kolodny Konstantin Moiseev."

Similar presentations


Ads by Google