1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect.

Slides:



Advertisements
Similar presentations
Design and Implementation of VLSI Systems (EN1600)
Advertisements

Logic Gate Delay Modeling -1 Bishnu Prasad Das Research Scholar CEDT, IISc, Bangalore
Topics Electrical properties of static combinational gates:
Lecture 5: DC & Transient Response
Logical Effort A Method to Optimize Circuit Topology Swarthmore College E77 VLSI Design Adem Kader David Luong Mark Piper December 6, 2005.
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
1 1 Avinoam Kolodny Technion – Israel Institute of Technology Intel PVPD Symposium July 2006 Issues in the Design of Wires.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by S  = 0.7x) Device delay shrinks by s –Wire geometries.
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
Introduction to CMOS VLSI Design Lecture 5: Logical Effort David Harris Harvey Mudd College Spring 2004.
LOW-LEAKAGE REPEATERS FOR NETWORK-ON-CHIP INTERCONNECTS Arkadiy Morgenshtein, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel Institute of.
Introduction to CMOS VLSI Design Lecture 4: DC & Transient Response Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Logical Effort.
© Digital Integrated Circuits 2nd Inverter CMOS Inverter: Digital Workhorse  Best Figures of Merit in CMOS Family  Noise Immunity  Performance  Power/Buffer.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 13 - More about.
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
Interconnect Optimizations
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 22: Material Review Prof. Sherief Reda Division of Engineering, Brown University.
Outline Noise Margins Transient Analysis Delay Estimation
Effects of Global Interconnect Optimizations on Performance Estimation of Deep Sub-Micron Design Yu (Kevin) Cao 1, Chenming Hu 1, Xuejue Huang 1, Andrew.
1 MICROELETTRONICA Logical Effort and delay Lection 4.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
CMOS VLSI Design4: DC and Transient ResponseSlide 1 EE466: VLSI Design Lecture 05: DC and transient response – CMOS Inverters.
Introduction to CMOS VLSI Design Lecture 5: Logical Effort
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 Logical Effort - sizing for speed.
Design and Implementation of VLSI Systems (EN0160)
EE4800 CMOS Digital IC Design & Analysis
EE 447 VLSI Design Lecture 5: Logical Effort. EE 447 VLSI Design 5: Logical Effort2 Outline Introduction Delay in a Logic Gate Multistage Logic Networks.
EE 447 VLSI Design 4: DC and Transient Response1 VLSI Design DC & Transient Response.
SLIP 2000April 9, Wiring Layer Assignments with Consistent Stage Delays Andrew B. Kahng (UCLA) Dirk Stroobandt (Ghent University) Supported.
CMOS VLSI For Computer Engineering Lecture 4 – Logical Effort Prof. Luke Theogarajan parts adapted form Harris – and Rabaey-
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 5.1 EE4800 CMOS Digital IC Design & Analysis Lecture 5 Logic Effort Zhuo Feng.
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
Review: CMOS Inverter: Dynamic
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Elmore Delay, Logical Effort
Logical Effort: optimal CMOS device sizing Albert Chun (M.A.Sc. Candidate) Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE) Ottawa,
Introduction  Chip designers face a bewildering array of choices –What is the best circuit topology for a function? –How many stages of logic give least.
Optimal digital circuit design Mohammad Sharifkhani.
Logical Effort and Transistor Sizing Digital designs are usually expected to operate at high frequencies, thus designers often have to choose the fastest.
Lecture 6: Logical Effort
Introduction to CMOS VLSI Design Lecture 5: Logical Effort GRECO-CIn-UFPE Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Lecture 6: Logical Effort
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
EE 4271 VLSI Design, Fall 2013 Static Timing Analysis and Gate Sizing Optimization.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Chapter 4: Secs ; Chapter 5: pp
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits Yehea I. Ismail and Eby G. Friedman, Fellow, IEEE.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
1 Modeling and Optimization of VLSI Interconnect Lecture 2: Interconnect Delay Modeling Avinoam Kolodny Konstantin Moiseev.
Dirk Stroobandt Ghent University Electronics and Information Systems Department Multi-terminal Nets do Change Conventional Wire Length Distribution Models.
1 Modeling and Optimization of VLSI Interconnect Lecture 5: Single Net Optimization Avinoam Kolodny Konstantin Moiseev.
Load-Sensitive Flip-Flop Characterization Seongmoo Heo and Krste Asanović MIT Laboratory for Computer Science WVLSI 2001.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
The Interconnect Delay Bottleneck.
Static Timing Analysis and Gate Sizing Optimization
Static Timing Analysis and Gate Sizing Optimization
Lecture 6: Logical Effort
Lecture 6: Logical Effort
Introduction to CMOS VLSI Design Lecture 5: Logical Effort
Estimating Delays Would be nice to have a “back of the envelope” method for sizing gates for speed Logical Effort Book by Sutherland, Sproull, Harris Chapter.
Lecture 6: Logical Effort
Advanced Logical Effort
COMBINATIONAL LOGIC - 2.
Presentation transcript:

1 Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny Technion – Israel Institute of Technology Timing Optimization in Logic with Interconnect SLIP (System Level Interconnect Prediction) 2008

2 Timing Optimization function AB Typically, a mixture of both Intro Special cases A B AB only gates only wires

3 Logic with Wires Common Example Intro UART design

4 The Interconnect Wall Logic w/o wires Long wires Logic Gate Sizing Logical Effort Interconnect Optimization Repeater Insertion Intro

5 Timing Optimization in Logic with Interconnect Logic w/o wires Long wires A B Intro

6 Existing Techniques A (very) Short Tutorial

7 Logical Effort (only logic) - delay of minimal inverter R 0 ·C 0, technology constant Delay model - logical effort, gate type factor: e.g. g inv =1 - electrical effort, load driving capability Delay = = = Intro - parasitic effort, due to output capacitance I. Sutherland, B. Sproull, and D. Harris, “Logical Effort - Designing Fast CMOS Circuits,” Morgan Kaufmann, Optimal sizing Delay i = Delay i+1 g i h i =g i+1 h i+1

8 No wires Limitations of Logical Effort Delay = = = = = = LE breaks down Logic with wires and branches No fixed side branches Intro ? ? ?

9 Optimal sizing Optimal number of repeaters Repeater Insertion (only wires) Delay ~ Length 2 D = RC = 25 D = Σrc = 5 Intro H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990 Delay ~ Length - effective resistance of minimal inverter - wire resistance - gate capacitance of minimal inverter - wire capacitance

10 Properties of Repeater Insertion Characteristics of RI Number and size of repeaters are independent Single optimal size for a given process and metal layer x fixed = Intro equal Assumptions of basic repeater insertion (RI) Equal size Equal spacing Terminal gates are similar to repeaters

11 So, What Are We Going To Do?

12 We Are Breaking The Wall Logic w/o wiresLong wires Intro Logical EffortRepeaters Insertion Challenges: Gate placements Gate sizes Number of gates, repeaters WANTED – solution for the mixed case

13 Our Approach to Timing Optimization Unified Logical Effort (ULE) Gate-terminated Sized Repeater Insertion (GSRI) Logic Gates as Repeaters (LGR) Gate placement (along the wire) Gate sizes Number of repeaters

14 Logic Gates as Repeaters - LGR “Where should the gates be located (along the wire)?”

15 The Idea LGR Problem – delay reduction in logic with wire A solution – wire segmenting by repeaters Drawback – power, area w/o logical functionality = waste Proposed – logic gates as repeaters LGR - distribution of logic gates over interconnect - driving the partitioned wire without adding repeaters K. Venkat, “Generalized Delay Optimization of Resistive Interconnections through an Extension of Logical Effort,” ISCAS 1993

16 LGR Delay Modeling Total Delay LGR M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, “Logic Gates as Repeaters (LGR) for Area-Efficient Timing Optimization,” IEEE TVLSI, 2006

17 Optimal Wire Segmenting Output resistance of driving gate i below average  wire length i is increased Input capacitance of successor gate i+1 above average  wire length i is decreased All gates are equal  equal partitioning In the case of a negative segment length, neighbor gates are merged LGR

18 LGR Results Delay reduction of up-to 27% - by “moving” the gates Critical path of decoder circuit LGR Further delay reduction – by scaling and LGR+RI M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.

19 Optimal Gate Scaling Enlargement of all gates by a uniform factor S to minimize timing can be performed iteratively with Segmenting inverters equal segments LGR

20 LGR Segmenting and Scaling For intermediate wires LGR outperforms RI by up-to 55% For long wires RI is faster BUT: it requires 44 repeaters Best for long wires – combined LGR and RI Uniform scaling performed for all gates M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, LGR

21 Logic gates serve as repeaters  No need for logically redundant repeaters Delay reduction + lower area/power Can be combined with RI LGR Summary LGR

22 Unified Logical Effort - ULE “What is the optimal size of the gates?”

23 Unified Delay Model (including wires) Capacitive interconnect effort Resistive interconnect effort ULE

24 Minimal Delay Condition ULE Minimal Delay Equal Stage Delays

25 Minimal Delay for Capacitive Wires Capacitive interconnect (short wires and branches) General RC interconnect ULE

26 ULE Convergence to LE and RI repeater insertion repeater scaling special cases ULE logic without wires Logical Effort

27 Some Algebra… ULE

28 Intuition of ULE Optimum ULE optimal size = Delay caused by gate capacitance should be equal to delay caused by gate resistance

29 ULE Optimality ULE Size too small high resistance Size too big high capacitance

30 Optimal Gate Capacitance ULE Expression for size of a single gate Gate sizes along a logic path are iteratively determined

31 Examples (1): ULE Sizing Equal wires Total electrical effort H = 10 L = 0  Size converges to LE Longer wires  ULE is faster Long wires  Fixed sizing x opt ULE Gate # C a p a c i t a n c e ( × C 0 ) x opt LE 10 μ m 50 µ m 100 µ m 0. 5 m m L=1mm 10 L=0

32 Examples (2): ULE Sizing Total electrical effort H = 1 L = 0  Converges to LE (no scaling) All wire lengths  ULE is faster Long wires  Fixed sizing x opt ULE Gate# C a p a c i t a n c e ( × C 0 ) x opt LE 10µm 50µm 100µm 0.5mm L=1 L=0

33 So, What is X opt ? For long wires ULE

34 Optimum Condition for Long Wires ULE For long wires

35 X opt and Repeaters equal wires INV (g=1) Optimal sizing condition for repeater ULE H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990

36 Solving Design Problems with X opt - Layout constraint - optimal size of the repeater located between two wires ULE

37 Solving Design Problems with X opt - Cell size constraint - optimal wire length with a repeater of size x rep ULE

38 Typical Design Example Optimal ULE sizing (a)similar gates, similar wires (b)different gates, similar wires (c)similar gates, different wires Gates with higher logical effort get bigger size No fixed x opt in circuits with various gates and wires ULE

39 ULE Results Critical path in a logic circuit (e.g. Adder) Simulation Setup Compared to Cadence Virtuoso® Analog Optimizer (using numerical algorithms) ‎ 65 nm CMOS ULE

40 LE becomes inaccurate as the wire lengths grows ULE is close to Analog Optimizer tool within 9% ULE: minimal delay Analog Optimizer: minimal delay (but sloooooow) Logical Effort: higher delay Delay Optimization ULE

41 ULE run time is orders of magnitude shorter than the run time of Analog Optimizer ULE run time is shorter than 1 second Run time [min] Run Time Comparison ULE

42 Power-Delay Optimization in ULE Power is function of gate and wire capacitances Optimal gate size C i ULE

43 x1 L1 x2 L2 x3 L3 x4 L4 X5 L5 x6 L6 x7 L7 x8 L8 x9 L9 X10 Sizing for minimal P×D Random logic path assumed with 10 stages Four wire length scenarios S1: all wires L = 100µm S2: all wires L = 80µm S3: all wires L = 400µm S4: L = {900,600,150,300,800,200,400,150,250} Power-Delay optimization reduces gate sizes as compared to Delay optimization Gate size (×C 0 ) ULE (S4) minimal Delay minimal Power×Delay

44 energy (pJ) delay (ps) Reduced Energy, Low Delay Penalty ULE Energy S1S2S3S4 scenario energy [pJ] minimal Power-Delay minimal Delay Delay S1S2S3S4 scenario delay [ps] minimal Power-Delay minimal Delay

45 ULE for Branches and Fanout General ULE condition for gate sizing ULE

46 ULE Sizing in Path with Branches Four branch scenarios S1: Lb = 400µm, Cb = 1 for all branches S2: Lb = 400µm, Cb = 30 for all branches S3: Lb = {400, 100, 400, 400}µm, Cb = {30,1,30,1} S4: Lb = {100, 100, 100, 400}µm, Cb = {1,1,1,30} Lw = 100µm for all wires at critical path Branches cause a change in sizing as compared to ULE without branches

47 ULE Delay Optimization with Branches Additional delay reduction is obtained using extended ULE condition with branches

48 Useful over entire range of problems  logic only – logic & wires – wires only Computes optimal gate sizes Low computational complexity Unified Logical Effort Summary ULE =

49 “When can I reduce delay by adding an inverter?” One More Question: ULE

50 Adding an Inverter to Reduce Delay condition for inverter insertion ULE

51 Inverter Addition vs. Gate Sizing L = 1000µm X 1, X 3 - variables Inverter insertion depends on the value and ratio of the gate sizes X 1 and X 3 Size of the inverter X 2 is determined from ULE ULE

52 Inverter Addition – More Applications No wires Beneficial when the electrical effort is higher than 4 vs. wire length equal wires Beneficial when the wire is longer than Lcr Power Beneficial when the expected delay reduction is more than ∆ ULE

53 Example: Critical Wire Length Lcr (µm) ∆ Critical Length vs. ∆ Critical length Lcr for inverter insertion depends upon the minimal delay reduction factor ∆ Size of the inverter X2 is determined from ULE ULE

54 Gate-Terminated Sized Repeater Insertion - GSRI “What is the optimal number of gates/repeaters?”

55 Revisiting Standard Repeater Insertion GSRI RI Assumptions Fixed and equal sizes Terminal gates are similar to repeaters fixed equal BUT The wires are usually located between different logic gates Different repeater sizes may be chosen Gate-Terminated Sized Repeater Insertion (GSRI) is proposed

56 Delay Model of Logic with Repeaters GSRI

57 Delay Minimization by GSRI GSRI RI assumptions - Long wires - Terminal gates are repeaters - Many repeaters (K>>1) H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194 ‑ 219, 1990

58 Example: Single Wire GSRI how many repeaters? RI  2 GSRI  4 Why? The first gate is weaker than the repeater (RI assumption is inaccurate)

59 Number of Repeaters in Logic Path GSRI GSRI allows optimization of shorter wires than RI The number of repeaters per wire is not equal in GSRI: - Higher electrical effort  more repeaters - ALU critical path, 65nm process - Several wire lengths scenarios - ULE sizing performed before GSRI

60 Delay Reduction by GSRI GSRI ULE sizing w/o repeaters  RI/GSRI  ULE sizing on repeaters GSRI result in up to 25% delay reduction as compared to RI ULE further reduces the delay by up to 27% mostly in short wires

61 GSRI Followed by ULE Sizing GSRI Two alternatives for ULE sizing - Sizing of the repeaters, without sizing the gates - Power-efficient - Sizing of the entire path, including the gates and the repeaters - Lowest delay Size (×C 0 ) GSRI GSRI)

62 Using Smaller Repeaters GSRI Smaller size  more repeaters Power may decrease for higher number of smaller repeaters Many smaller repeaters  reduced transition time  lower short-circuit currents 17% delay reduction 15% power reduction & Delay [ps] Power [pW]

63 Additional Perspective GSRI GSRI may provide smaller delay with smaller repeaters than RI Power-aware RI will lead to higher delay penalty than currently assumed

64 Accurate number of repeaters  Terminal gates ≠ repeaters Supports smaller repeaters  Analytic expression – no more “rules of thumb” Minimal delay  GSRI delay < standard RI delay Gate-terminated Sized Repeater Insertion Summary GSRI

65 Summary of Approaches ULE GSRI LGR

66 Summary LE – only logic RI – only wires We propose: general solution - logic with wires Unified Logical Effort (ULE) - Fast sizing of gates in presence of interconnect - Intuitive conditions for minimal delay Gate-terminated Sized Repeater Insertion (GSRI) - Accurate optimal number of repeaters - Enhanced design flexibility and smaller delay than in RI Logic Gates as Repeaters (LGR) - Distribution of logic gates over interconnect - Delay optimization without logically-redundant repeaters

67 Future Work Analyzing wire sizing Developing power efficient heuristics Incorporating inductance Integration in EDA tools

68 Thank You!