1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect.

Presentation on theme: "1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect."— Presentation transcript:

1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect SLIP (System Level Interconnect Prediction) 2008

2 Timing Optimization function AB Typically, a mixture of both Intro Special cases A B AB only gates only wires

3 Logic with Wires Common Example 1 2 3 5 4 1 2 3 4 5 Intro UART design

4 The Interconnect Wall Logic w/o wires Long wires Logic Gate Sizing Logical Effort Interconnect Optimization Repeater Insertion Intro

5 Timing Optimization in Logic with Interconnect Logic w/o wires Long wires A B Intro

6 Existing Techniques A (very) Short Tutorial

7 Logical Effort (only logic) - delay of minimal inverter R 0 ·C 0, technology constant Delay model - logical effort, gate type factor: e.g. g inv =1 - electrical effort, load driving capability Delay = = = Intro - parasitic effort, due to output capacitance I. Sutherland, B. Sproull, and D. Harris, “Logical Effort - Designing Fast CMOS Circuits,” Morgan Kaufmann, 1999. Optimal sizing Delay i = Delay i+1 g i h i =g i+1 h i+1

8 No wires Limitations of Logical Effort Delay = = = = = = LE breaks down Logic with wires and branches No fixed side branches Intro ? ? ?

9 Optimal sizing Optimal number of repeaters Repeater Insertion (only wires) Delay ~ Length 2 D = RC = 25 D = Σrc = 5 Intro H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990 Delay ~ Length - effective resistance of minimal inverter - wire resistance - gate capacitance of minimal inverter - wire capacitance

10 Properties of Repeater Insertion Characteristics of RI Number and size of repeaters are independent Single optimal size for a given process and metal layer x fixed = Intro equal Assumptions of basic repeater insertion (RI) Equal size Equal spacing Terminal gates are similar to repeaters

11 So, What Are We Going To Do?

12 We Are Breaking The Wall Logic w/o wiresLong wires Intro Logical EffortRepeaters Insertion Challenges: Gate placements Gate sizes Number of gates, repeaters WANTED – solution for the mixed case

13 Our Approach to Timing Optimization Unified Logical Effort (ULE) Gate-terminated Sized Repeater Insertion (GSRI) Logic Gates as Repeaters (LGR) Gate placement (along the wire) Gate sizes Number of repeaters

14 Logic Gates as Repeaters - LGR “Where should the gates be located (along the wire)?”

15 The Idea LGR Problem – delay reduction in logic with wire A solution – wire segmenting by repeaters Drawback – power, area w/o logical functionality = waste Proposed – logic gates as repeaters LGR - distribution of logic gates over interconnect - driving the partitioned wire without adding repeaters K. Venkat, “Generalized Delay Optimization of Resistive Interconnections through an Extension of Logical Effort,” ISCAS 1993

16 LGR Delay Modeling Total Delay LGR M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, “Logic Gates as Repeaters (LGR) for Area-Efficient Timing Optimization,” IEEE TVLSI, 2006

17 Optimal Wire Segmenting Output resistance of driving gate i below average  wire length i is increased Input capacitance of successor gate i+1 above average  wire length i is decreased All gates are equal  equal partitioning In the case of a negative segment length, neighbor gates are merged LGR

18 LGR Results Delay reduction of up-to 27% - by “moving” the gates Critical path of 8-256 decoder circuit LGR Further delay reduction – by scaling and LGR+RI M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.

19 Optimal Gate Scaling Enlargement of all gates by a uniform factor S to minimize timing can be performed iteratively with Segmenting inverters equal segments LGR

20 LGR Segmenting and Scaling For intermediate wires LGR outperforms RI by up-to 55% For long wires RI is faster BUT: it requires 44 repeaters Best for long wires – combined LGR and RI Uniform scaling performed for all gates M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004. LGR

21 Logic gates serve as repeaters  No need for logically redundant repeaters Delay reduction + lower area/power Can be combined with RI LGR Summary LGR

22 Unified Logical Effort - ULE “What is the optimal size of the gates?”

23 Unified Delay Model (including wires) Capacitive interconnect effort Resistive interconnect effort ULE

24 Minimal Delay Condition ULE Minimal Delay Equal Stage Delays

25 Minimal Delay for Capacitive Wires Capacitive interconnect (short wires and branches) General RC interconnect ULE

26 ULE Convergence to LE and RI repeater insertion repeater scaling special cases ULE logic without wires Logical Effort

27 Some Algebra… ULE

28 Intuition of ULE Optimum ULE optimal size = Delay caused by gate capacitance should be equal to delay caused by gate resistance

29 ULE Optimality ULE Size too small high resistance Size too big high capacitance

30 Optimal Gate Capacitance ULE Expression for size of a single gate Gate sizes along a logic path are iteratively determined

31 Examples (1): ULE Sizing Equal wires Total electrical effort H = 10 L = 0  Size converges to LE Longer wires  ULE is faster Long wires  Fixed sizing x opt ULE 1 23456789 20 30 40 50 60 70 80 90 100 Gate # C a p a c i t a n c e ( × C 0 ) x opt LE 10 μ m 50 µ m 100 µ m 0. 5 m m L=1mm 10 L=0

32 Examples (2): ULE Sizing Total electrical effort H = 1 L = 0  Converges to LE (no scaling) All wire lengths  ULE is faster Long wires  Fixed sizing x opt ULE 123456789 10 15 20 25 30 35 40 45 50 55 60 Gate# C a p a c i t a n c e ( × C 0 ) x opt LE 10µm 50µm 100µm 0.5mm L=1 L=0

33 So, What is X opt ? For long wires ULE

34 Optimum Condition for Long Wires ULE For long wires

35 X opt and Repeaters equal wires INV (g=1) Optimal sizing condition for repeater ULE H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990

36 Solving Design Problems with X opt - Layout constraint - optimal size of the repeater located between two wires ULE

37 Solving Design Problems with X opt - Cell size constraint - optimal wire length with a repeater of size x rep ULE

38 Typical Design Example Optimal ULE sizing (a)similar gates, similar wires (b)different gates, similar wires (c)similar gates, different wires Gates with higher logical effort get bigger size No fixed x opt in circuits with various gates and wires ULE

39 ULE Results Critical path in a logic circuit (e.g. Adder) Simulation Setup Compared to Cadence Virtuoso® Analog Optimizer (using numerical algorithms) ‎ 65 nm CMOS ULE

40 LE becomes inaccurate as the wire lengths grows ULE is close to Analog Optimizer tool within 9% ULE: minimal delay Analog Optimizer: minimal delay (but sloooooow) Logical Effort: higher delay Delay Optimization ULE

41 ULE run time is orders of magnitude shorter than the run time of Analog Optimizer ULE run time is shorter than 1 second Run time [min] Run Time Comparison ULE

42 Power-Delay Optimization in ULE Power is function of gate and wire capacitances Optimal gate size C i ULE

43 x1 L1 x2 L2 x3 L3 x4 L4 X5 L5 x6 L6 x7 L7 x8 L8 x9 L9 X10 Sizing for minimal P×D Random logic path assumed with 10 stages Four wire length scenarios S1: all wires L = 100µm S2: all wires L = 80µm S3: all wires L = 400µm S4: L = {900,600,150,300,800,200,400,150,250} Power-Delay optimization reduces gate sizes as compared to Delay optimization Gate size (×C 0 ) ULE (S4) minimal Delay minimal Power×Delay

44 energy (pJ) delay (ps) Reduced Energy, Low Delay Penalty ULE Energy 0 1 2 3 4 5 6 7 8 9 10 S1S2S3S4 scenario energy [pJ] minimal Power-Delay minimal Delay Delay 0 500 1000 1500 2000 2500 3000 3500 4000 S1S2S3S4 scenario delay [ps] minimal Power-Delay minimal Delay

45 ULE for Branches and Fanout General ULE condition for gate sizing ULE

46 ULE Sizing in Path with Branches Four branch scenarios S1: Lb = 400µm, Cb = 1 for all branches S2: Lb = 400µm, Cb = 30 for all branches S3: Lb = {400, 100, 400, 400}µm, Cb = {30,1,30,1} S4: Lb = {100, 100, 100, 400}µm, Cb = {1,1,1,30} Lw = 100µm for all wires at critical path Branches cause a change in sizing as compared to ULE without branches

47 ULE Delay Optimization with Branches Additional delay reduction is obtained using extended ULE condition with branches

48 Useful over entire range of problems  logic only – logic & wires – wires only Computes optimal gate sizes Low computational complexity Unified Logical Effort Summary ULE =

49 “When can I reduce delay by adding an inverter?” One More Question: ULE

50 Adding an Inverter to Reduce Delay condition for inverter insertion ULE

51 Inverter Addition vs. Gate Sizing L = 1000µm X 1, X 3 - variables Inverter insertion depends on the value and ratio of the gate sizes X 1 and X 3 Size of the inverter X 2 is determined from ULE ULE

52 Inverter Addition – More Applications No wires Beneficial when the electrical effort is higher than 4 vs. wire length equal wires Beneficial when the wire is longer than Lcr Power Beneficial when the expected delay reduction is more than ∆ ULE

53 Example: Critical Wire Length Lcr (µm) ∆ Critical Length vs. ∆ Critical length Lcr for inverter insertion depends upon the minimal delay reduction factor ∆ Size of the inverter X2 is determined from ULE ULE

54 Gate-Terminated Sized Repeater Insertion - GSRI “What is the optimal number of gates/repeaters?”

55 Revisiting Standard Repeater Insertion GSRI RI Assumptions Fixed and equal sizes Terminal gates are similar to repeaters fixed equal BUT The wires are usually located between different logic gates Different repeater sizes may be chosen Gate-Terminated Sized Repeater Insertion (GSRI) is proposed

56 Delay Model of Logic with Repeaters GSRI

57 Delay Minimization by GSRI GSRI RI assumptions - Long wires - Terminal gates are repeaters - Many repeaters (K>>1) H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990

58 Example: Single Wire GSRI how many repeaters? RI  2 GSRI  4 Why? The first gate is weaker than the repeater (RI assumption is inaccurate)

59 Number of Repeaters in Logic Path GSRI GSRI allows optimization of shorter wires than RI The number of repeaters per wire is not equal in GSRI: - Higher electrical effort  more repeaters - ALU critical path, 65nm process - Several wire lengths scenarios - ULE sizing performed before GSRI

60 Delay Reduction by GSRI GSRI ULE sizing w/o repeaters  RI/GSRI  ULE sizing on repeaters GSRI result in up to 25% delay reduction as compared to RI ULE further reduces the delay by up to 27% mostly in short wires

61 GSRI Followed by ULE Sizing GSRI Two alternatives for ULE sizing - Sizing of the repeaters, without sizing the gates - Power-efficient - Sizing of the entire path, including the gates and the repeaters - Lowest delay Size (×C 0 ) GSRI GSRI)

62 Using Smaller Repeaters GSRI Smaller size  more repeaters Power may decrease for higher number of smaller repeaters Many smaller repeaters  reduced transition time  lower short-circuit currents 17% delay reduction 15% power reduction & Delay [ps] Power [pW]

63 Additional Perspective GSRI GSRI may provide smaller delay with smaller repeaters than RI Power-aware RI will lead to higher delay penalty than currently assumed

64 Accurate number of repeaters  Terminal gates ≠ repeaters Supports smaller repeaters  Analytic expression – no more “rules of thumb” Minimal delay  GSRI delay < standard RI delay Gate-terminated Sized Repeater Insertion Summary GSRI

65 Summary of Approaches ULE GSRI LGR

66 Summary LE – only logic RI – only wires We propose: general solution - logic with wires Unified Logical Effort (ULE) - Fast sizing of gates in presence of interconnect - Intuitive conditions for minimal delay Gate-terminated Sized Repeater Insertion (GSRI) - Accurate optimal number of repeaters - Enhanced design flexibility and smaller delay than in RI Logic Gates as Repeaters (LGR) - Distribution of logic gates over interconnect - Delay optimization without logically-redundant repeaters

67 Future Work Analyzing wire sizing Developing power efficient heuristics Incorporating inductance Integration in EDA tools

68 Thank You!

Download ppt "1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect."

Similar presentations