Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen

Similar presentations


Presentation on theme: "Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen"— Presentation transcript:

1 Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen

2 Publications A Fast algorithm for Optimal Wire-Sizing Under Elmore Delay Model, ISCAS, 1995. Optimal Wire-Sizing Formula Under the Elmore Delay Model, DAC, 1996. Optimal Wire-Sizing Formula Under the Elmore Delay Model, ACM Physical Deisgn Work Shop, 1996. Performance-Driven Buffered Clock Tree Optimization Based on Lagrangian Relaxation , DAC, 1996. Optimal Non-Uniform Wire-Sizing for Routing Trees, ICCAD, 1996. Optimal Wire-Sizing Funciton with Fringing Capacitance Consideration, DAC 1997. Spec-Based Buffer Insertion and Wire-Sizing for RC Nets, DTTC, 1997. Fast and Exact Simulataneous Transistor and Wire-Sizng by Lagrangian Relaxation, ICCAD, 98.

3 Outline Interconnect Optimization Thesis Wire Sizing Buffer Sizing
Buffer Insertion Interconnect Simulation

4 Interconnect Delay Trend

5 Interconnect Delay Trend

6 While gate delay diminishes, wire delay grows (as the wire becomes thinner and the isolation layers are thinner). Today, signal propagates over wires at about C/40 (in Al: 43mm per 7ps) In 15 years, even with Cu, we expect only C/80 speeds. Gate delay in this graph is not the logical gate delay. Rather, it is t=RC where R is on resistance of a minimal transistor and C is the load of a minimal transistor gate.

7 Wire-Sizing x1 x2 x3 x4

8 Buffer Sizing x1 x2

9 Buffer Insertion

10 Interconnect Model (area capacitance)
r0 L x x ca L x 2 ca L x 2

11 Driver Model Rd

12 Elmore Delay Computation
Rd R1 R2 R3 R4 CL C1 C2 C3 C4

13 Uniform Wire-Sizing x1 x2 x3 x4

14 Non-uniform Wire-Sizing
y Rd f(x) x CL

15 Optimal Wire-Sizing Function: Exponential Tapering
y Rd x CL

16 Interconnect Model (Fringing Capacitance)
r0 L x x ca L x+cf L 2 ca L x+cf L 2

17 Optimal Wire-Sizing Function: W-function tapering
y f(x)=ae-bx CL

18 Lambert’s W function w wew=x 0.4 -1 x -0.4 0.4

19 Optimal Wire-Sizing Function
Cf

20 Constrained Wire-Sizing
B C A B B C A B C

21 Relations among the six types of functions
ABC AB BC A B C L Rd CL

22 Wire-Sizing for Routing Trees
D w 4 1 w 2 w D w 2 1 f 2 5 f 1 f 3 w w 3 7 D 3

23 Weighted Sink Delay Optimization

24 Optimally Resizing One Segment

25 Minimizing Area with Delay Constraints

26 Minimizing Area with Delay Constraints
Lagrangian Relaxation Subproblem

27 Minimizing Area with Delay Constraints

28 Minimize Weighted Sink Delays
Algorithm Framework Adjust Lagramge Multipliers (Sink Weights) Minimize Weighted Sink Delays

29 Minimizing Maximum Delay
Lagrangian Relaxation Subproblem

30 Delay/Power/Area Minimization

31 Simulataneous Buffer and Wire-Sizing
4 1 w 2 w D w 2 1 f 2 5 f 1 f 3 w w 3 7 D 3

32 Uniform Wire-Sizing D w 4 1 w 2 w D w 2 1 5 f 1 w w 3 7 D 3

33 Upperbound, Lowerbound, Skew

34 Buffer Insertion 1

35 Spec-based Buffer Insertion
Given a routing tree, possible buffer insert location, required arrival times at receivers, max slope constraint, and polarity requirement A library of buffers: B1, B2, ..., Bn Insert buffers to satisify the spec (maximum delay, delay bounds at each recivers and maximum slope) 1 2 potential buffer location 3 17

36 Problem formulation Combinations of the following: Buffer Insertion
Buffer-Sizing Wire-Sizing Goals and Constraints Minimize the Maximum Delay Satisfy delay constraints at each receiver Repeater Insertion Location Constraints Maximum Slope constraint Polarity constraints. 6

37 Current Issues Previous solutions
Exhaustive enumeration-> Exponentially Growing First Ginneken, and then, Lillis, suggested a dynamic programming approach which can get optimal solution for delay under the Elmore delay model . Provides very useful information like power-delay curves Problems: not accurate, doesn’t consider reliability issues, runtime and storage already high. 2

38 Brief Algorithm Description
Traverse circuit in a bottom-up manner Enumerate all the possible solutions and prunes out sub-optimal solutions dynamically. How do we know which solution to kill? Violate the constraints If there is another solution cost less and achieve more in all aspects? Number of buffers Maximum delay Polarity Area, Power ... 17

39 Example 1 2 1 2 1 2

40 Example 1 1 2 2 1 1 2 2

41 Problems How to caculate the gate and interconnect delay accurately and efficiently? A naive approach: Repeately caculate the delay of the subtree by calling AWE or SPICE (causing O(N) penalty for each solution). However, the runtime is already high (proportional to N2). An efficiently hierarchical delay computaion method is needed How to include slope into consideration?

42 Accurate Load Model (Effective Capacitance)
The total-net capacitance is no longer a valid load model -- the second-order p-load driving-point admittance approximation is more accurate 10000 mm line 175 mm/100 mm driver 232.8 W 232.8 W 232.8 W 0.178 pF 0.356 pF 0.356 pF 0.178 pF too pessimistic, up 30% error equal average currents 364.2 W 1.07 pF 0.76 pF 0.226 pF 0.884 pF Total capacitance load model Second-order p-load model Effective capacitance load model 7

43 Accurate Repeater Model (Voltage Ramp)
Several timing analyzers model the gate by a single resistor Errors of up to 30% have been reported The proposed gate delay model is a fixed-resistor driven by a ramp voltage source Voltage ramp parameters, t0 and tx, are determined from the Gate characteric equations Reff CMOS gate tin { fixed ZL(s) t0 tx ZL(s) 9

44 Accuracy of Voltage Ramp Model
2.9 1.9 Model voltage source Model output Actual output 0.9 -0.1 0.5 1.0 0.0 t (ns) 11

45 How to calculate p-load hierarchically
There exists a simple way to calculate the p load (actually it can handle arbitrary higher order approximation) hierarchically. R(s) 1 Y1(s) Ynew(s) = 1 R(s)+ Y2(s) Yeq(s) Y1(s)+ Y2(s) Taylor Expansion: Yeq(s) =y1s + y2s2 + y3s3 + y4s4 + y5s5 + y6s6 Yeq(s)

46 What about wires? H (s) V2(s) = V1(s) Transfer function computation
Y1(s) Y1(s) Y2(s) V2(s) = V1(s) Y1(s)+ Y2(s) Taylor Expansion: H(s) =m0 +m1s+ m2s2 +m3s3 +m4s4 +m5s5 +m6s6 Transfer function computation

47 Many Stages H1 (s) H2 (s) Yeq(s) H1 (s) H2 (s) V3(s) = V1(s) V2(s)
Y1(s)+ Yeq(s) Y3(s)+ Y4(s) V2(s) Transfer function computation

48 What about Trees? H1 (s) H2 (s) Yeq(s) H2 (s) Yeq(s)
V1(s) H2 (s) Y1(s) Yeq(s) H2 (s) V2(s) Y2(s) Yeq(s) Keep track the worse sink’s transfer function

49 Hierarchical moment computation -- REX
Assume in general H(s) = 1/[b0 + b1s + b2s2 + b3s3 + b4s4 + b5s5] Y(s) = y1s + y2s2 + y3s3 + y4s4 + y5s5 + y6s6 Across a capacitor H’(s) = H(s) Y’(s) = Y(s) + Cs Across a resistor H’(s) = H(s)/[1 + R Y(s)] Y’(s) = Y(s)/[1 + R Y(s)] Base case at the receiver H(s) = 1 Y(s) = CLs H(s) Y(s) H(s) C Y(s) H(s) R Y(s) CL

50 With buffer inserted and slope consideration?
slope delay V1(s) slope delay slope:150 Y1(s) slope delay V2(s) slope:180 Y2(s) Interpolate and extrapolate the delay at receivers

51 Results Sample net -- 10000 mm line on m1pm2 broken into 40 segments
Delay before optimization ps Time for optimization seconds on RS6K Stages 1 2 3 4 OUR 2467 1736 1388 1267 1218 SPICE 2405 1761 1404 1301 1274 % error 2.5 -1.4 -1.1 -2.6 -4.3

52 Cost-Performance Curves
# of repeaters vs max delay (ps) 2500 2000 1500 OUR max delay SPICE 1000 500 1 2 3 4 # of repeaters

53 Case Study: 22.4 slope:100 mcf=1.5 Roses report: Cse report:
16.5 2.4/1000 1.2 m4 2.4/3000 1.8 m5 2.0/4000 1.2 m4 2.0/4000 1.2 m4 2.4/500 1.2 m4 1.2/1000 0.8 m3 22.4 slope:100 16.5 2.0/2500 0.8 m3 16.5 2.0/1200 1.2 m4 1.6/1300 1.2 m4 mcf=1.5 2.0/1500 0.8 m3 16.5 Roses report: Delay=1529 ps Cse report: Delay=1548 ps 16.5

54 Case Study: Manual Result (8 buffer)
2.4/1000 1.2 m4 2.4/3000 1.8 m5 2.0/4000 1.2 m4 2.0/4000 1.2 m4 2.4/500 1.2 m4 1.2/1000 0.8 m3 16.5 102 90 93 102 27 1.1 22.4 slope:100 16.5 2.0/2500 0.8 m3 32 76 max slope:294 16.5 2.0/1200 1.2 m4 1.6/1300 1.2 m4 2.0/1500 0.8 m3 16.5 Roses report: Delay=1047 ps Cse report: Delay=1053 ps 16.5

55 Case Study: Our Result (5 buffers)
2.4/1000 1.2 m4 2.4/3000 1.8 m5 2.0/4000 1.2 m4 2.0/4000 1.2 m4 2.4/500 1.2 m4 1.2/1000 0.8 m3 16.5 70 70 60 10 22.4 slope:100 2000 2000 16.5 2.0/2500 0.8 m3 40 max slope:260 16.5 2.0/1200 1.2 m4 1.6/1300 1.2 m4 2.0/1500 0.8 m3 16.5 Roses report: Delay=953 ps SPICE report: Delay=1017 ps 16.5

56 Runtime Report

57 # of wire segment vs maximum delay
# of repeaters

58 Conclusion Buffer model provides about 5~7% accuracy relative to SPICE
The total net capacitance is no longer a valid load approximation Using accurate models aid Hierarchical moments computation of RC delays and slopes New Moment-matching methods provide efficient and accurate delay calculation for RC nets especificaly for hierarchical moment computation Dynamic programming approaches applied to buffer insertion Hierarchical moment methods for efficient RC delay computation


Download ppt "Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen"

Similar presentations


Ads by Google