A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu, Zhuo Li, Charles Alpert Dept of Electrical.

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical and Computer Engineering Michigan Technological University **IBM Austin Research Lab Austin, TX

2 Outline Introduction Previous Works Timing-cost approximate dynamic programmingTiming-cost approximate dynamic programming Double- ɛ geometric sequence based oracle searchDouble- ɛ geometric sequence based oracle search The Algorithm Experimental Results Conclusion

3 0.18 0 50 100 150 200 250 300 Technology generation (  m ) Delay (psec) Transistor/Gate delay Interconnect delay 0.80.50.25 0.15 0.35 Interconnect Delay Dominates

44 Timing Driven Buffer Insertion

R Buffers Reduce RC Wire Delay x/2 cx/4 rx/2 ∆t = t_buf – t_unbuf = RC + t b – rcx 2 /4 x/2 cx/4 rx/2 C C R x ∆t∆t x/2 x Delay grows linearly with interconnect length

6 25% Gates are Buffers Saxena, et al. [TCAD 2004]

7 Problem Formulation T Minimal cost (area/power) solution 1.Steiner Tree 2.n candidate buffer locations

8 Solution Characterization To model effect to downstream, a candidate solution is associated with To model effect to downstream, a candidate solution is associated with v: a node v: a node C: downstream capacitance C: downstream capacitance Q: required arrival time Q: required arrival time W: cumulative buffer cost W: cumulative buffer cost

9 Dynamic Programming (DP) Candidate solutions are propagated toward the source Start from sinks Candidate solutions are generated Three operations – –Add Wire – –Insert Buffer – –Merge Solution Pruning

10 Generating Candidates (1) (2) (3)

11 Pruning Candidates (3) (a) (b) Both (a) and (b) look the same to the source. Remove the one with the worse slack and cost (4)

12 Merging Branches Right Candidates Left Candidates O(n 1 n 2 ) solutions after each branch merge. Worst-case O((n/m) m ) solutions.

13 DP Properties (Q 1,C 1,W 1 ) (Q 2,C 2,W 2 ) inferior/dominated if C 1  C 2, W 1  W 2 and Q 1  Q 2 Non-dominated solutions are maintained - for the same Q and W, pick min C Non-dominated solutions are maintained - for the same Q and W, pick min C # solutions depends on # of distinct W and Q, but not their values # solutions depends on # of distinct W and Q, but not their values

14 Previous Works 19901991…….1996…….20032004…….20082009 van Ginneken ’ s algorithm Lillis ’ algorithm Shi and Li’s algorithm Chen and Zhou ’ s algorithm NP-hardness proof

15 Bridging The Gap We are bridging the gap! A Fully Polynomial Time Approximation Scheme (FPTAS) A Fully Polynomial Time Approximation Scheme (FPTAS) Provably good Provably good Within (1+ ɛ ) optimal cost for any ɛ >0 Within (1+ ɛ ) optimal cost for any ɛ >0 Runs in time polynomial in n (nodes), b (buffer types) and 1/ ɛ Runs in time polynomial in n (nodes), b (buffer types) and 1/ ɛ Best solution for an NP-hard problem in theory Best solution for an NP-hard problem in theory Highly practical Highly practical

16 The Rough Picture W*: the cost of optimal solution Check it Make guess on W* Return the solution Good (close to W*) Not Good Key 2: Smart guess Key 1: Efficient checking

17 Key 1: Efficient Checking Benefit of guess Only maintain the solutions with cost no greater than the guessed cost Only maintain the solutions with cost no greater than the guessed cost Accelerate DP Accelerate DP

Oracle (x): the checker, able to decide whether x>W* or not Oracle (x): the checker, able to decide whether x>W* or not – Without knowing W* – Answer efficiently 18 The Oracle Oracle (x) Guess x within the bounds Setup upper and lower bounds of cost W* Update the bounds

19 Construction of Oracle(x) Scale and round each buffer cost Only interested in whether there is a solution with cost up to x satisfying timing constraint Dynamic Programming Perform DP to scaled problem with n/ ɛ. Runtime polynomial in n/ ɛ

20 Scaling and Rounding ɛ x ɛ /n ɛ 2x ɛ /n ɛ 3x ɛ /n ɛ 4x ɛ /n Buffer cost 0 buffer costs are integers due to rounding and are bounded by n/ ɛ. Rounding error at each buffer ɛ, total rounding error ɛ. Rounding error at each buffer  x ɛ /n, total rounding error  x ɛ. Larger x: larger error, fewer distinct costs and faster Larger x: larger error, fewer distinct costs and faster Smaller x: smaller error, more distinct costs and slower Smaller x: smaller error, more distinct costs and slower Rounding is the reason of acceleration Rounding is the reason of acceleration

DP Results 21 Yes, there is a solution satisfying timing constraint No, no such solution With cost rounding back, the solution has cost at most n/ ɛ x ɛ /n + x ɛ = (1+ ɛ )x > W* With cost rounding back, the solution has cost at least n/ ɛ x ɛ /n = x  W* DP result w/ all w are integers  n/ ɛ

22 Rounding on Q # solutions bounded by # distinct W and Q # solutions bounded by # distinct W and Q # W = O(n/ ɛ 1 ) # W = O(n/ ɛ 1 ) –Rounding before DP # Q # Q –Round up Q to nearest value in {0, ɛ 2 T/m, 2 ɛ 2 T/m, 3 ɛ 2 T/m,…,T }, in branch merge (m is # sinks) –Rounding during DP –# Q = O(m/ ɛ 2 ) # non-dominated solutions is O(mn/ ɛ 1 ɛ 2 ) # non-dominated solutions is O(mn/ ɛ 1 ɛ 2 ) 3 ɛ 2 T/m 2 ɛ 2 T/m ɛ 2 T/m 4 ɛ 2 T/m 0

Q-W Rounding Before Branch Merge W Q n/ ɛ 1 T ɛ 2 T/m 01234 2 ɛ 2 T/m 3 ɛ 2 T/m 4 ɛ 2 T/m

24 Solution Propagation: Add Wire c 2 = c 1 + cx c 2 = c 1 + cx q 2 = q 1 - (rcx 2 /2 + rxc 1 ) q 2 = q 1 - (rcx 2 /2 + rxc 1 ) r: wire resistance per unit length r: wire resistance per unit length c: wire capacitance per unit length c: wire capacitance per unit length (v 1, c 1, w 1, q 1 ) (v 2, c 2, w 2, q 2 ) x

25 Solution Propagation: Insert Buffer (v 1, c 1, w 1, q 1 ) (v 1, c 1b, w 1b, q 1b ) q 1b = q 1 - d(b) q 1b = q 1 - d(b) c 1b = C(b) c 1b = C(b) w 1b = w 1 + w(b) w 1b = w 1 + w(b) d(b): buffer delay d(b): buffer delay

Buffer Insertion Runtime

27 Solution Propagation: Merge Round q in both branches Round q in both branches c merge = c l + c r c merge = c l + c r w merge = w l + w r w merge = w l + w r q merge = min(q l, q r ) q merge = min(q l, q r ) (v, c l, w l, q l )(v, c r,w lr, q r )

Branch Merge Runtime - 1 Target Q=0

Branch Merge Runtime - 2 Target Q= ɛ 2 T/m

Branch Merge Runtime -3 Target Q= 2 ɛ 2 T/m

Branch Merge Runtime -4

32 Timing-Cost Approximate DP Lemma: a buffering solution with cost at most (1+ ɛ 1 )W* and with timing at most (1+ ɛ 2 )T can be computed in time Lemma: a buffering solution with cost at most (1+ ɛ 1 )W* and with timing at most (1+ ɛ 2 )T can be computed in time

33 Key 2: Geometric Sequence Based Guess U (L): upper (lower) bound on W* U (L): upper (lower) bound on W* Naive binary search style approach Naive binary search style approach Runtime (# iterations) depends on the initial bounds U and L Runtime (# iterations) depends on the initial bounds U and L Oracle (x) x=(U+L)/2 Set U and L on W* (1+ ɛ )x U= (1+ ɛ )x L= x W*<(1+ ɛ )x W*  x

34 Adapt ɛ 1 ɛ 1 Rounding factor x ɛ 1 /n for W Larger ɛ 1 : faster with rough estimation Larger ɛ 1 : faster with rough estimation Smaller ɛ 1 : slower with accurate estimation Smaller ɛ 1 : slower with accurate estimation Adapt ɛ 1 according to U and L Adapt ɛ 1 according to U and L

35 U/L Related Scale and Round Buffer cost 0 U/L x ɛ /n

36 Conceptually Begin with large ɛ 1 and progressively reduce it (towards ɛ ) according to U/L as x approaches W* Begin with large ɛ 1 and progressively reduce it (towards ɛ ) according to U/L as x approaches W* Fix ɛ 2 = ɛ in rounding Q for limiting timing violation Fix ɛ 2 = ɛ in rounding Q for limiting timing violation Set ɛ 1 ɛ Set ɛ 1 as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛ ɛ 1 Total runtime is bounded by the last run as O(… + n/8 + n/4 + n/2 + … + n/ ɛ ) = O(n/ ɛ ), independent of # iterations One run of DP takes about O(n/ ɛ 1 ) time. Total runtime is bounded by the last run as O(… + n/8 + n/4 + n/2 + … + n/ ɛ ) = O(n/ ɛ ), independent of # iterations

Oracle Query Till U/L<2 37

38 Mathematically

39 The Algorithmic Flow Oracle (x) Adapting ɛ 1 =[U/L-1] 1/2 Set U and L of W* Set x=[UL/(1+ ɛ 1 )] 1/2 Update U or L U/L<2 Compute final solution

When U/L<2 40 At least one feasible solution, otherwise no solution with cost 2n/ ɛ L ɛ /n = 2L  U At least one feasible solution, otherwise no solution with cost 2n/ ɛ L ɛ /n = 2L  U A single DP runtime A single DP runtime Pick min cost solution satisfying timing at driver W=2n/ ɛ Scale and round each cost by L ɛ /n Scale and round each cost by L ɛ /n Run DP

Main Theorem  Theorem: a (1+ ɛ ) approximation to the timing constrained minimum cost buffering problem can be computed in O(m 2 n 2 b/ ɛ 3 + n 3 b 2 / ɛ ) time for 0< ɛ <1 and in O(m 2 n 2 b/ ɛ +mn 2 b+n 3 b) time for ɛ  1

42 Experiments Experimental Setup Experimental Setup – 1000 industrial nets – 48 buffer types including non-inverting buffers and inverting buffers Compared to Dynamic Programming Compared to Dynamic Programming

43 Cost Ratio Compared to DP Approximation Ratio ɛ Buffer Cost Ratio

44 Speedup Compared to DP Approximation Ratio ɛ Speedup

45 Timing Violations (% nets) Approximation Ratio ɛ Timing violations

46 Cost Ratio w/ Timing Recovery Approximation Ratio ɛ Buffer Cost Ratio

47 Speedup w/ Timing Recovery Approximation Ratio ɛ Speedup

48 Observations Without timing recovery Without timing recovery –FPTAS always achieves the theoretical guarantee –Larger ɛ leads to more speedup –On average about 5x faster than dynamic programming –Can run 4.6x faster with 0.57% solution degradation –<5% nets with timing violations With timing recovery With timing recovery –FPTAS well approximates the optimal solutions –Can still have >4x speedup

NP-Hardness Complexity Exponential Time Algorithm Our Bridge

50 Conclusion Propose a (1+ ɛ ) approximation for timing constrained minimum cost buffering for any ɛ > 0 Propose a (1+ ɛ ) approximation for timing constrained minimum cost buffering for any ɛ > 0 –Runs in O(m 2 n 2 b/ ɛ 3 + n 3 b 2 / ɛ ) time –Timing-cost approximate dynamic programming –Double- ɛ geometric sequence based oracle search –5x speedup in experiments –Few percent additional buffers as guaranteed theoretically The first provably good approximation algorithm on this problem The first provably good approximation algorithm on this problem

51 Thanks

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu, Zhuo Li, Charles Alpert Dept of Electrical.

Similar presentations

Presentation on theme: "A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu, Zhuo Li, Charles Alpert Dept of Electrical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.

Similar presentations

Presentation on theme: "A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical."— Presentation transcript:

Similar presentations

About project

Feedback

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu, Zhuo Li, Charles Alpert Dept of Electrical.

Presentation on theme: "A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu, Zhuo Li, Charles Alpert Dept of Electrical."— Presentation transcript: