Download presentation
Presentation is loading. Please wait.
1
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion
2
Moore’s law Twice the number of transistors, approximately every two years, so double clock frequency accordingly
3
3 0.18 Source: Gordon Moore, Chairman Emeritus, Intel Corp. 0 50 100 150 200 250 300 Technology generation ( m ) Delay (psec) Transistor/Gate delay Interconnect delay 0.80.50.25 0.15 0.35 Interconnects Dominate This is why Moore’s law is not true anymore.
4
Objectives What have we learned? –Compute circuit delay on wires and gates –Gate delay optimization What are we going to learn? –Interconnect delay optimization: buffer insertion Why reducing delay How to perform it –This is the most important optimization in circuit design
5
5 0.18 Source: Gordon Moore, Chairman Emeritus, Intel Corp. 0 50 100 150 200 250 300 Technology generation ( m ) Delay (psec) Transistor/Gate delay Interconnect delay 0.80.50.25 0.15 0.35 Why is this trend?
6
A scaling primer Ideal process scaling: –Device geometries shrink by S = 0.7x) Device delay shrinks by s –Wire geometries shrink by Unit resistance R/ : /(ws.hs) = r/s 2 Unit coupling capacitance Cc/ : (hs)/(Ss) Resistance doubled, capacitance roughly unchanged for unit length How about the change in wire length? SGD h w l S ll hh SS ww
7
Technology scaling Global (long) interconnect lengths don’t shrink –Global interconnect link cells far apart Local (short) interconnect lengths shrink by s –Local interconnects link cells nearby
8
Interconnect delay scaling Delay of a wire of length l : int = (rl)(cl) = rcl 2 (a quadratic function of length) Local interconnects : int : (r/s 2 )(c)(ls) 2 = rcl 2 –Local interconnect delay unchanged Global interconnects : int : (r/s 2 )(c)(l) 2 = (rcl 2 )/s 2 –Global interconnect delay doubled – unsustainable! Interconnect delay increasingly more dominant
9
Buffer Insertion For Delay Reduction
10
Elmore Delay for Wire x C unit wire capacitance c unit wire resistance r
11
Elmore Delay for Buffer v C u Driving resistance Input capacitance
12
Elmore Delay for A Circuit Delay = all Ri all Cj downstream from Ri Ri*Cj Elmore delay to n1 R(B)*(C1+C2) Elmore delay to n2 R(B)*(C1+C2)+R(w)*C2 R(B) C1 R(w) C2 n1 B n2
13
R Buffers Reduce Wire Delay x/2 cx/4 rx/2 t_unbuf = R( cx + C ) + rx( cx/2 + C ) t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) t_buf – t_unbuf = RC – rcx 2 /4 x/2 cx/4 rx/2 C C R x ∆t∆t
14
Buffered global interconnects: Intuition Interconnect delay = r.c.l 2 /2 Interconnect delay = r.c.l i 2 /2 < r.c.l 2 /2 (where l = l j ) since (l j 2 ) < ( l j ) 2 (Of course, we need to consider buffer delay as well) l1l1 lnln l3l3 l2l2 l
15
Optimal Buffer Insertion on A Wire Delay before buffer insertion = rcL 2 /2 Assume N identical buffers with equal inter-buffer length l (L = Nl) For minimum delay, L R d – On resistance of inverter C g – Gate input capacitance r,c – unit resistance and capacitance … … l
16
Optimal interconnect delay Substituting l opt back into the interconnect delay expression: Delay grows linearly with L (instead of quadratically)
17
Total buffer count Ever-increasing fractions of total cell count will be buffers –70% in 32nm –25% is widely observed 0 10 20 30 40 50 60 70 80 90nm65nm45nm32nm % cells used to buffer nets clk-buf buf tot-buf
18
ITRS projections
19
Exercise 1 Given a wire of length 10 with r=2, c=2, what is its delay? Given a buffer with R d =10, C g =20, after optimally buffering the wire, what is the delay? What if wire length is 100? Any conclusion?
20
Exercise 2 Relationship with gate sizing –If we can size the buffer, what is the best buffer size? –Let R 0 denote the unit size buffer driving resistance, and C 0 denote the unit size buffer input capacitance. Thus, R d =R 0 /h and C g =C 0 h –What is best h leading to smallest delay?
21
Analogy
22
Advancing technology = period of city expansion, more transistors = larger city Interconnects = streets Buffers = gas stations Signal delay (timing) = time to cross the city Buffer insertion = gas station construction
23
Previous Result is Only Theoretical: Discrete Buffer Locations Candidate buffer locations
24
RAT: Required Arrival Time RAT = 100 Wire delay = 80 AT = 0 RAT = 100 Wire delay = 80 AT = 0 RAT = 20 AT = 80
25
Slack: RAT - AT RAT = 100 Wire delay = 80 AT = 0 RAT = 20 AT = 80 Slack = 20 Minimizing circuit delay = maximizing RAT at driver = maximizing slack at driver
26
Motivation for Problem Formulation RAT = 300 AT = 350 Slack = RAT-AT= -50 RAT = 700 AT = 600 Slack = 100 RAT = 300 AT = 250 Slack = 50 RAT = 700 AT = 400 Slack = 300 slack = -50 slack = 50 Decouple capacitive load from critical path RAT = Required Arrival Time Slack = RAT - AT We need to maximum slack or RAT at driver
27
Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters –Candidate buffer locations Find buffer insertion solution such that the slack (or RAT) at the driver is maximized
28
An Example for Buffer Insertion (v 1, 1, 20) 22 v1v1 v1v1 (v 2, 3, 16) r = 1, c = 1 R b = 1, C b = 1 R d = 1 (v 2, 1, 13) v1v1 (v 3, 5, 8) v1v1 (v 3, 3, 9) slack = 6slack = 3 Add wire Insert buffer Add wire Add driver CQ
29
Candidate Buffering Solution Definition Each candidate solution is associated with –v i : a node –c i : downstream capacitance –q i : RAT v i is a sink c i is sink capacitance v is an internal node
30
Van Ginneken’s Algorithm Candidate solutions are propagated toward the source
31
Solution Propagation: Add Wire c 2 = c 1 + cx q 2 = q 1 – rcx 2 /2 – rxc 1 r: wire resistance per unit length c: wire capacitance per unit length (v 1, c 1, q 1 ) (v 2, c 2, q 2 ) x
32
32 Solution Propagation: Insert Buffer c 1b = C b q 1b = q 1 – R b c 1 C b : buffer capacitance R b : buffer resistance (v 1, c 1, q 1 ) (v 1, c 1b, q 1b )
33
Solution Propagation: Add Driver q 0d = q 0 – R d c 0 R d : driver resistance Pick solution with max slack (v 0, c 0, q 0 ) (v 0, c 0d, q 0d )
34
Exercise (20,400) 2 2 Unit Wire Cap = 5 Unit Wire Res = 3 Buffer C=5, R=1 Perform buffer insertion to maximize the slack at driver 2
35
Exponential Runtime 2 solutions 4 solutions 8 solutions 16 solutions n candidate buffer locations lead to 2 n solutions
36
Solution Pruning Two candidate solutions –(v, c 1, q 1 ) –(v, c 2, q 2 ) Solution 1 is inferior if –c 1 > c 2 : larger load –and q 1 < q 2 : tighter timing
37
LOAD An Analogy - I Faster -> Smaller Delay -> Larger RAT (since RAT = RAT output - Delay) Larger Load -> Larger Capacitance
38
LOAD Faster & smaller load (larger RAT, smaller capacitance): Good Slower & larger load (smaller RAT, larger capacitance): Inferior END An Analogy - II
39
END Who will be the winner? Cannot tell at this moment, so keep both of them. An Analogy - III
40
END Who will be the winner? Cannot tell at this moment, so keep both of them. An Analogy - IV
41
Pruning When Insert Buffer They have the same load cap C b, only the one with max q is kept
42
42 Generating Candidates (1) (2) (3) From Dr. Charles Alpert
43
43 Pruning Candidates (3) (a) (b) Both (a) and (b) “look” the same to the source. Throw out the one with the worse slack (4)
44
44 Candidate Example Continued (4) (5)
45
45 Candidate Example Continued After pruning (5) At driver, compute which candidate maximizes slack. Result is optimal.
46
46 Example (20,400) (30,250) (5, 220) (20,400) (30,250) (5, 220) (40, 40) (5, 0) (15,160) (5, 145) Unit Wire Cap = 5 Unit Wire Res = 3 Buffer C=5, R=1 2 2 2
47
47 Example Cont’d (20,400) (30,250) (5, 220) (40, 40) (5, 0) (15,160) (5, 145) (5,0) is inferior to (5,145). (45,40) is inferior to (15,160) (20,400) (30,250) (5, 220) (15,160) (5, 145) (5,15) (5,70) Pick solution with largest slack, follow arrows to get solution
48
Exercise Without pruning, there will be exponential number of candidate solutions (i.e., given n candidate buffer locations, there will be 2 n solutions). With pruning, how many solutions will we have?
49
Exercise Unit Wire Cap = 1 Unit Wire Res = 1 Buffer C=1, R=1 2 2 (10,40) (8,50) (5,10) (15,40) (7,10) (9,30) (12,20) Continue the following buffer insertion process. Assume that all partial candidate buffering solutions are as shown.
50
Summary Interconnect delay increases with technology scaling Linear interconnect delay with buffer insertion Buffer insertion with candidate buffer locations Pruning for accelerating buffer insertion technique
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.