Presentation is loading. Please wait.

Presentation is loading. Please wait.

F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms.

Similar presentations


Presentation on theme: "F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms."— Presentation transcript:

1 F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms for Separable Packing LPs

2 2 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing ILP formulations PTAS for separable packing LPs Analysis Experimental results

3 3 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing ILP formulations PTAS for separable packing LPs Analysis Experimental results

4 4 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing ILP formulations PTAS for separable packing LPs Analysis Experimental results

5 5 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing ILP formulations PTAS for separable packing LPs Analysis Experimental results

6 6 VLSI Global Routing

7 7 Buffered Buffer Blocks

8 8 Problem Formulation Global Routing via Buffer-Blocks (GRBB) Problem Given: BB locations and capacities List of multi-pin nets – upper-bound on #buffers for each source-sink path L/U bounds on the wirelength b/w consecutive buffers/pins Find: Buffered routing of a maximum number of nets subject to the given constraints

9 9 Integer Program Formulation

10 10 Enforcing Parity Constraints Inverting buffers change the polarity of the signal Each sink has a given polarity requirement  Parity constraints for the #buffers on each routed source-sink path  A path may use two buffers in the same buffer block Integer program changes Split each BB vertex r of G into two copies, r’ and r’’ Impose capacity constraint on the sets of vertices {r’,r’’}

11 11 Combining with compaction

12 12 Combining with compaction

13 13 Combining with compaction Set capacity constraints: cap(BB1) + cap(BB2)  const.

14 14 GRBB with Buffer Library Discrete buffer library: different buffer sizes/driving strengths  Need to allocate BB capacity between different buffer types Integer program changes Replace each BB vertex r of G by a set X(r) of vertices (one for each buffer type) Modify edge set of G to take into account non-uniform driving strengths Impose capacity constraint on the sets of vertices X(r):

15 15 “Relax+Round” Approach to GRBB 1.Solve the fractional relaxation –Exact linear programming algorithms are impractical for large instances –KEY IDEA: use an approximation algorithm allows fine-tuning the tradeoff between runtime and solution quality 2.Round to integer solution –Provably good rounding [RT87] –Practical runtime (random-walk based)

16 16 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing LP formulations PTAS for separable packing LPs Analysis Experimental results

17 17 Separable Packing LP

18 18 Previous Work MCF and packing/covering LP approximation: [FGK73,SM90, PST91,G92,GK94,KPST94,LMPSTT95,R95,Y95,GK98,F00,…] Exponential length function to model flow congestion [SM90] Shortest-path augmentation + final scaling [Y95] Modified routing increment [GK98] Fewer shortest-path augmentations [F00] We extend speed-up idea of [F00] to separable packing LPs

19 19 Separable Packing LP Algorithm w(X)  , f  0,  =  For i = 1 to N do For k = 1, …, #nets do Find min weight feasible Steiner tree T for net k While weight(T) < min{ 1, (1+  )  } do f(T)= f(T) + 1 For every X do w(X)  ( 1 +   (T,X)/cap(X) ) * w(X) End For Find min weight feasible Steiner tree T for net k End While End For  = (1+  )  End For Output f/N

20 20 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing ILP formulations PTAS for separable packing LPs Analysis Experimental results

21 21 Runtime Dual LP: Choose #iterations N such that all feasible trees have weight  1 after N iterations (i.e.,   1) Tree weight lower bound is  initially, and is multiplied by ( 1+  ) in each iteration

22 22 Approximation Guarantee Theorem: For every  <.15, the algorithm finds factor 1/(1+4  ) approximation by choosing where L is the maximum number of vertices in a feasible Steiner tree. For this value of , the running time is

23 23 Outline VLSI design motivation –Global routing via buffer-blocks –Separable packing ILP formulations PTAS for separable packing LPs Analysis Experimental results

24 24 Implementation choices 2-Pin3,4-pinMulti-pin DecompositionStar, Minimum Spanning tree Matching, 3-restricted Steiner tree Not needed Min-weight DRSTShortest path (exact) Try all Steiner pts + shortest paths (exact) Very hard!  heuristics RoundingRandom-walkBackward random-walks

25 25 1.Store fractional flows f(T) for every feasible Steiner tree T 2.Scale down each f(T) by 1-  for small  3.Each net k routed with prob. f(k)=  { f(T) | T feasible for k }  Number of routed nets  (1-  )OPT 4.To route net k, choose tree T with probability = f(T) / f(k)  With high probability, no BB capacity is exceeded Problem: Impractical to store all non-zero flow trees Provably Good Rounding

26 26 1.Store fractional flows f(T) for every valid routing tree T 2.Scale down each f(T) by 1-  for small  3.Each net k routed with prob. f(k)=  { f(T) | T routing for k }  Number of routed nets  (1-  )OPT 4.To route net k, choose tree T with probability = f(T) / f(k)  With high probability, no BB capacity is exceeded Random-Walk 2-TMCF Rounding use random walk from source to sink Practical: random walk requires storing only flows on edges

27 27 Random-Walk MTMCF Rounding S T1 T2 T3 Source  Sinks

28 28 Random-Walk MTMCF Rounding S T1 T2 T3 Source  Sinks

29 29 The MTMCF Rounding Heuristic 1.Round each net k with probability f(k), using backward random walks –No scaling-down, approximate MTMCF < OPT 2.Resolve capacity violations by greedily deleting routed paths –Few violations 3.Greedily route remaining nets using unused BB capacity –Further routing still possible

30 30 Implemented Heuristics Greedy buffered routing: 1.For each net, route sinks sequentially along shortest paths to source or node already connected to source 2.After routing a net, remove fully used BBs Generalized MCF approximation + randomized rounding –G2TMCF –G3TMCF (3-pin decomposition) –G4TMCF (4-pin decomposition) –GMTMCF (no decomposition, approximate DRST)

31 31 Experimental Setup Test instances extracted from next-generation SGI microprocessor Up to 5,000 nets, ~6,000 sinks U=4,000  m, L=500-2,000  m 50 buffer blocks 200-400 buffers / BB

32 32 % Routed Nets vs. Runtime

33 33 Conclusions and Ongoing Work Provably good algorithms and practical heuristics based on separable packing LP approximation –Higher completion rates than previous algorithms Extensions: –Combine global buffering with BB planning –Buffer “site” methodology  tile graph –Routing congestion (channel capacity constraints) –Simultaneous pin assignment

34 34

35 35 % Sinks Connected #sinks/ #nets Greed G2TMCFG3TMCFG4TMCFGMTMCF  =.64  =.04  =.64  =.04  =.64  =.04  =.64  =.04 2958/ 2396 92.293.895.596.297.896.698.396.797.4 3077/ 2438 92.393.996.596.498.596.998.897.699.3 3099/ 2784 92.193.695.596.498.096.698.197.398.7 6038/ 4764 93.594.896.895.797.696.598.496.397.7 6296/ 4925 93.696.297.697.098.697.799.197.798.4 6321/ 4938 93.396.297.596.898.497.798.997.798.2

36 36 Runtime (sec.) #sinks/ #nets Greed G2TMCFG3TMCFG4TMCFGMTMCF  =.64  =.04  =.64  =.04  =.64  =.04  =.64  =.04 2958/ 2396.301.633579.162,09098.9129,1902.33947 3077/ 2438.332.3535011.102,356128.3837,9702.87846 3099/ 2784.331.8039212.562,364132.8138,3412.86877 6038/ 4764.532.8460016.573,166182.5560,4504.981,866 6296/ 4925.554.3569019.53,721265.7877,6715.381,828 6321/ 4938.543.3773018.993,813255.3779,1235.431,833

37 37 Resource Usage Greed G2TMCFG3TMCFG4TMCFGMTMCF  =.64  =.04  =.64  =.04  =.64  =.04  =.64  =.04 # Conn. Sinks 5,6455,7255,8425,7795,8965,8275,9425,8135,897 % Conn. Sinks 93.594.896.895.797.696.598.496.397.7 WL (meters) 42.2245.1847.8044.4847.6644.1847.4945.3347.51 WL/sink (microns) 7,4797,8918,1827,6978,0837,5827,9927,7988,057 #Buff90379,86010,6769,59110,6109,49710,5079,86010,647 #Buff/sink1.601.721.831.661.801.631.771.701.81 #nets = 4,764 #sinks = 6,038 400 buffers/BB

38 38 Resource Usage for 100% Completion Greed  =.04 4TMCF,  =.04 #buffers/BB1,000 or INF5006001,000INF WL (meters)47.8949.4649.5849.9851.40 WL/sink (microns) 7,9318,1918,2128,2788,513 #Buff10,33011,07911,11511,37311.803 #Buff/sink1.711.831.841.881.95 #nets = 4,764 #sinks = 6,038MTMCF wastes routing resources!


Download ppt "F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms."

Similar presentations


Ads by Google