Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago.

Similar presentations

Presentation on theme: "Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago."— Presentation transcript:

1 Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago

2 Common Algorithmic Approaches in VLSI CAD Divide & Conquer (D&C) [e.g., merge-sort, partition-driven placement, tech.mapping of fanout-free ckt for dynamic power min.] Reduce & Conquer (R&C) [e.g., multilevel techniques such as the hMetis partitioner] Dynamic programming [e.g., matrix multiplication, optimal buffer insertion] Mathematical programming: linear, quadratic, 0/1 integer programming [e.g., floorplanning, global placement]

3 Common Algorithmic Approaches in VLSI CAD (contd) Search Methods : –Depth-first search (DFS): mainly used to find any solution when cost is not an issue [e.g., FPGA detailed routing---cost generally determined at the global routing phase] –Breadth-first search (BFS): mainly used to find a soln at min. distance from root of search tree [e.g., maze routing when cost = dist. from root] –Best-first search (BeFS): used to find optimal or provably sub-optimal (at most a certain given factor of optimal) solutions w/ any cost function, Can be done when a provable lower-bound of the cost can be determined for each branching choice from the “current partial soln node” [e.g., TSP, global routing] Iterative Improvement: deterministic, stochastic Min-cost network flow

4 Divide & Conquer Determine if the problem can be solved in a hierarchical or divide-&- conquer (D&C) manner: – D&C approach: See if the problem can be “broken up” into 2 or more smaller subproblems that can be “stitched-up” to give a soln. to the parent prob. – Do this recrusively for each large subprob until subprobs are small enough for an “easy” solution technique (could be exhasutive!) – If the subprobs are of a similar kind to the root prob then the breakup and stitching will also be similar –The final design may or may not be optimal (will be optimal if the problem has the dynamic programming property; see later) Subprob. A1 A1,1A1,2A2,1 A2,2 Root problem A Subprob. A2 Stitch-up of solns to A1 and A2 to form the complete soln to A Do recursively until subprob-size is s.t. an exhaustive based optimal design is doable Example from CAD: Min-total-sw-prob. (or min-dynamic power) tech. mapping of a fanout-free circuit.

5 Reduce-&-Conquer Reduce problem size (Coarsening) Solve Uncoarsen and refine solution Examples: Multilevel graph/hypergraph partitioning (e.g., hMetis), multilevel routing

6 Dynamic Programming (DP) The above primary property of DPs (optimal substructure: optimal solns. of sub- problems is part of optimal soln. of parent problem) also means that everytime we optimally solve the subproblem, we can store/record the soln and reuse it everytime it is part of the formulation of a higher-level problem. The ocurrence of a subproblem multiple times in different higher-level problems is called the overlapping subproblem property. It is, however, not a necessary feature of a DP problem. Stitch-up function Stitch-up function f: Optimal soln of root = f(optimal solns of subproblems) = f(opt(A1), opt(A2), opt(A3), opt(A4)) Root Problem A A1A2A3A4 Reuse of subproblem soln. Subproblems

7 Dynamic Programming (contd.) A negative example: Total sw. probability minimization in tech. mapping in a fanout-free circuit = SwP-Min(C, p(z)): C is a fanout-free ckt w/ z as its output. The problem is to minimize the p(z) + sum of sw. probabilities (0  1 transition probabilities) at the o/p of TM’ed gates in C excluding z (z’s sw. prob. is included in p(z)). For a cut Ci w/ z at its o/p that can be TM’ed to a gate gi in the library, let x, y be 2 i/ps. Let p(x,y) be the mapping of p(z), based on gi, in terms of only the 4 transition probs. at x and y. Then, since p(x,y) is inseparable in terms of the trans.probs. of x and y, the exact problem to be solved is SwP_Min(C – Ci, p(x,y)), where C-Ci has 2 o/ps x, y, and thus independent cuts have to be taken for x and y, and the combination of these 2 sets of cuts will come into play. This will lead to a combinatorial explosion as we got further down the circuit to the inputs of each pair of cuts for x and y. The final formulation is SwP-Min(C, p(z)) = Min all feasible Ci at z (SwP_Min(C – Ci, p(X(Ci)), where X(Ci) is the set of i/ps generated by Ci. The above is not a D&C approach. In a D&C approach, we can create two subproblems SwP_Min(T(x), p(x) = p(x, y const )) and SwP_Min(T(y), p(y) = p(x const, y), where T(x) is the sub-circuit of C (a subtree) w/ x as its o/p, and p(x, y const ) is p(x, y) assuming some constant values for the 4 trans. probs. at y (or the subset of trans. probs. of y involved in p(x,y)). Since there is no guarantee, and in fact it is unlikely, that the assumed constant values for the trans. probs. at y will be the exact trans. probs. one obtains by optimally solving the problem SwP_Min(C – Ci, p(x,y)) (which is the exact problem to solve), an optimal soln. to SwP_Min(T(x), p(x, y const )) is not guaranteed to lead to, i.e., be part of the optimal soln. to SwP_Min(C, p(z)). A similar argument holds for the optimal soln. to and SwP_Min(T(y), p(x const, y). z x y SwP_Min(T(x), p(x, y const ) SwP_Min(T(y), p( x const,y) Sw. prob. at z in terms of various trans. probs. at all fanins cut by subset Si(z) Fig.: D&C approach for SwP_Min(C, P 0->1 (z)) Ci Another way to look at the reason for this, is to see that the two subproblems are not independent (the trans. probs. implied at their o/ps by their solns. is needed to solve each subproblem leading to a cyclic dependency). Since the above D&C seems to be the only way to break up SwP_Min(C, p(z)) into subproblems, this problem is not amenable to DP as it does not have the optimal substructure property.

8 Dynamic Programming (contd.) A positive example: Total wire minimization in tech. mapping in a fanout-free circuit = DP_Min(C): C is a fanout-free ckt w/, say, z as its output. The problem is to minimize the sum of the number of outputs (each o/p contributes to an “exposed” wire in the circuit that needs to be routed), i.e., the sum of wires at the o/ps of TM’ed gates. For a cut Ci w/ z at its o/p that can be TM’ed to a gate gi in the library, let x, y be 2 i/ps. Then the problem of minimizing the # of o/p wires in T(x) and T(y) are clearly independent problems, and the optimal soln. to each is part of the optimal soln. to DP_Min(C, z) given the cut Ci. So the overall optimal formulation is to take the minimum soln. over all feasible cuts Ci w/ z at their o/p. DP_TM(C) = Min all feasible Ci at z = o/p at C  xj in X(Ci) DP_TM(T(xj)), where X(Ci) is the set of i/ps generated by Ci. Whichever is the min. soln. producing cut Ck, the optimal solns. to the subproblems at its i/ps is part of the otimal soln. for DP_TM(C). Thus, since the optimal substructure property holds, this problem is amenable to dynamic programming. Ci x y DP_TM(T(x)) DP_TM(C) z

9 Dynamic Programming (contd) Matrix multiplication example: Most computationally efficient way to perform the series of matrix mults: M = M1 x M2 x ………….. x Mn, Mi is of size ri x ci w/ ri = ci-1 for i > 1. DP formulation: opt_seq(M) = (by defn) opt_seq(M(1,n)) = min i=1 to n-1 {opt_seq(M(1, i)) + opt_seq(M(i+1, n)) + r1xcixcn} Correctness rests on the property that the optimal way of multiplying M1x … x Mi & Mi+1 to Mn will be used in the “min” stitch-up function to determine the optimal soln for M Thus if the optimal soln invloves a “cut” at Mr, then the opt_seq(M(1,r)) & opt_seq(M(r+1,n)) will be part of opt_seq(M) Perform computation bottom-up (smallest sequences first) Complexity: Note that each subseq M(j, k) will appear in the above computation and is solved exactly once (irrespective of how many times it appears). Time to solve M(j, k), j = j, not counting the time to solve its subproblems (which are accounted for in the complexity of each M(j,k)) is (length l of seq) -1 = l-1 (since min of l-1 different options is computed), where l = j-k+1 # of different M(j, k)’s is of length l = n – l + 1, 2 <= l <= n. Total complexity = Sum l = 2 to n (l-1) (n-l+1) =  (n 3 ) (as opposed to, say, O(2 n ) using exhaustive search) Stitch-up function Root Problem A A1A2A3A4 Subproblems

10 DP in VLSI CAD Example for the simple problem of only an optimization objective: Min-wire cost tech. mapping of a fanout-free circuit, where the cost is # of wires. Thus best cost of a subproblem is easy to define and is a single value However, in CAD, the problems are generally multi-parameter ones: one opt. objective (min. or max.) and several upper-bound or lower-bound constraints on several metrics/parameters Which solution of a subproblem (i.e., a partial solution) is best is now harder to determine among several at a particular node of the DP tree or dag (directed acyclic graph)? Concept of domination is now important: A partial solution X represented by a vector of opt. and constraint metrics (a1, a2, …, ak) that is not worse in all metrics than any other partial soln. (i.e., X is not dominated by any other partial soln. of the same subproblem) is “best”. So there are multiple “best” solutions of a subproblem, one or more of which can be part of the optimal/best solution(s) of the parent problem. So after solving a subproblem, we will get multiple solutions (partial sols. of the parent problem), and we need to keep the non-dominated ones only and combine them w/ non- dominated solns of sibling subproblems to determine solns. to the parent problem. Note that we need to get rid of all dominated partial solns. as they are guaranteed not to lead to the optimal soln. of the full problem or more locally to non- dominated/best solns. of the parent problem.

11 A DP Example: Simple Buffer Insertion Problem Given: Source and sink locations, sink capacitances and RATs (reqd. arrival time), a buffer type, source delay rules, unit wire resistance and capacitance Buffer RAT 1 RAT 2 RAT 3 RAT 4 s0s0 Courtesy: Chuck Alpert, IBM

12 Simple Buffer Insertion Problem (contd) Find: Buffer locations and a routing tree such that slack (i.e., RAT) at the source is maximized—this gives greatest flexibility at the source in various ways: getting +ve RATs at fanin gates w/ fewer buffers at fanin nets, thus indirectly optimizing some other metrics, e.g., total leakage power or total cell/gate area. RAT 2 RAT 3 RAT 4 RAT 1 s0s0 Courtesy: Chuck Alpert, IBM RAT Possible buffer insertion points [nodes]—at and below branch nodes, and intermediate points on a long branchless interconnect

13 Slack/RAT Example RAT = 400 delay = 600 RAT = 500 delay = 350 RAT = 400 delay = 300 RAT = 500 delay = 400 Slack/RAT = -200 Slack/RAT = +100 Courtesy: Chuck Alpert, IBM Unsynthesizable!

14 Elmore Delay AB C R1R1 R2R2 C1C1 C2C2 Courtesy: Chuck Alpert, IBM (= Delay(A  B) + Delay(B  C)—sum of delays of “branch-less” segments on path from A  C). Delay of a branchless seg: Delay(A  B) = res(A  B)*total cap seen by this res.) + wire delay (RwCw/2), Rw (Cw) = wire res. (cap) [wire delay ignored above]

15 DP Example: Van Ginneken Buffer Insertion Algorithm [ISCAS’90] Associate each leaf node/sink with two metrics (C t, T t ). Ct (cap seen) is useful as upstream delay is dependent on Ct (how dependent will be based on usptream res. that us not known at this point—dependent on buffer insertion or not options taken later), and this upstream RAT dependent on both Ct and Tt. Downstream loading capacitance (C t ) and RAT (T t ). Want to min. C t and max. T t DP-based algo propagates potential solutions bottom-up [Van Ginneken, 90]. At each intermediate node t (a branch node or an artificial node on a long branch/interconnect), for each downstream soln. (Cn, Tn) do: a)Add a wire: b)Subsequently add a buffer: c)Consider both buffer and no-buffer (i.e., wire-only) solns. among the set of solns. at t. d)If t is a branch node, merge 2 every pair of sub-solutions at each sub-tree: For each Z n =(C n,T n ), Z m =(C m,T m ) soln. vectors in the 2 subtrees, create a soln vector Z t =(C t,T t ) where (note that wire-only/buffer options at this node will be considered after merging): Courtesy: UCLA C n, T n C t, T t C n, T n C t, T t C n, T n C m, T m C t, T t C w, R w Note: L n below is the same as C n

16 DP Example (contd) d)(contd.) After merging: i.Add a wire to each merged solution Z t (same cap. & delay change formulation as before) ii.Add a buffer to each Z t as before e)Delete all dominated solutions at t: Z t1 =(C t1, T t1 ) is dominated if there exists a Z t2 =(C t2, T t2 ) s.t. C t1 >= C t2 and T t1 <= T t2 (i.e., both metrics are worse) f)The remaining soln vectors are all “optimal”/“best” solns at t, and one of them will be part of the optimal solution at the root/driver of the net---this is the DP feature of this algorithm RAT 2 RAT 3 RAT 4 RAT 1 s0s0

17 Van Ginneken Example (20,400) (30,250) (5, 220) Wire C=10,d=150 Buffer C=5, d=30 (20,400) Buffer C=5, d=50 C=5, d=30 Wire C=15,d=200 (for 1 st subsoln) C=15,d=120 (for 2 nd subsoln) (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) Courtesy: Chuck Alpert, IBM Intermediate nodes for possible buffer location

18 Van Ginneken Example Cont’d (20,400) (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (5,0) is inferior to (5,70). (45,50) is inferior to (20,100) (20,400) (30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) Pick solution with largest slack (max RAT), follow arrows forward to get final complete solution Wire C=10, d=90 (for 1 st soln.) Courtesy: Chuck Alpert, IBM Wire C=10, d=80 (for 2nd soln.)

19 Mathematical Programming Linear programming (LP) E.g., Obj: Min 2x1-x2+x3 w/ constraints x1+x2 <= a, x1-x3 <= b -- solvable in polynomial time Quadratic programming (QP) E.g., Min. x1 2 – x2x3 w/ linear constraints -- solvable in polynomial (cubic) time w/ equality constraints Others Mixed integer linear prog (ILP) -- NP-hard Mixed integer quad. prog (IQP) -- NP-hard Mixed 0/1 integer linear prog (0/1 ILP) -- NP-hard Mixed 0/1 integer quad. prog (0/1 IQP) -- NP-hard Some vars are integers Some vars are in {0,1}

20 0/1 ILP/QLP Examples Generally useful for “assignment” problems, where objects {O1,..., On) are to be assigned (possibly exclusively) to bins {B1,..., Bm} 0/1 variable x i,j = 1 of object Oi is assigned to bin Bj Min-cut bi-partitioning for graphs G(V,E) can me modeled as a 0/1 IQP V1 V2 ui uj IQP modeling of min-cut part.: ➢ x i,1 = 1 => u i in V1 else u i in V2 (2 nd var. x i,2 not needed due to mutual exclusivity & implication by x i,1 ). ➢ Edge (ui, uj) in cutset if: x i,1 (1-x j,1 ) + (1-x i,1 )(x j,1 ) = 1 ➢ Objective function: Min Sum (ui, uj) in E c(i,j) (x i,1 (1-x j,1 ) + (1-x i,1 )(x j,1 ) ➢ Constraint: Sum w(ui) x i,1 <= max-size

21 21EE VLSI Design Automation I Example 2 for ILP/IQP: HLS Resource Constraint Scheduling Constrained scheduling –General case NP-complete –Minimize latency given constraints on area or the resources (ML-RCS) –Minimize resources subject to bound on latency (MR- LCS) Exact solution methods –ILP: Integer Linear Programming –Hu’s heuristic algorithm for identical processors/ALUs Heuristics –List scheduling –Force-directed scheduling

22 22EE VLSI Design Automation I Use binary decision variables –i = 0, 1,..., n –l = 1, 2,..., ’+1 ’ given upper-bound on latency –x il = 1 if operation i starts at step l, 0 otherwise. Set of linear inequalities (constraints), and an objective function (min latency) Observations – – t i = start time of op i. – is op v i (still) executing at step l ? ILP Formulation of ML-RCS [Mic94] p.198 ?

23 23EE VLSI Design Automation I Start Time vs. Execution Time For each operation v i, only one start time If d i =1, then the following questions are the same: –Does operation v i start at step l ? –Is operation v i running at step l ? But if d i >1, then the two questions should be formulated as: –Does operation v i start at step l ? Does x il = 1 hold? –Is operation v i running at step l ? Does the following hold? ?

24 24EE VLSI Design Automation I Operation v i Still Running at Step l ? Is v 9 running at step 6? –Is x 9,6 + x 9,5 + x 9,4 = 1 ? Note: –Only one (if any) of the above three cases can happen –To meet resource constraints, we have to ask the same question for ALL steps, and ALL operations of that type v9v x 9,4 =1 v9v x 9,5 =1 v9v x 9,6 =1

25 25EE VLSI Design Automation I Operation v i Still Running at Step l ? Is v i running at step l ? –Is x i,l + x i,l x i,l-di+1 = 1 ? vivi l l-1 l-d i x i,l-di+1 =1 vivi l l-1 l-d i x i,l-1 =1 vivi l l-1 l-d i x i,l =1...

26 26EE VLSI Design Automation I Constraints: –Exactly one start time per operation i: For each i,  x i,l = 1, l in [t i S, t i L ] –Sequencing (dependency) relations must be satisfied –Resource constraints Objective: min ILP Formulation of ML-RCS (cont.)

27 27EE VLSI Design Automation I ILP Example Assume = 4 First, perform ASAP and ALAP –(we can write the ILP without ASAP and ALAP, but using ASAP and ALAP will simplify the inequalities) + NOP   +<      +< v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11 v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11

28 28EE VLSI Design Automation I ILP Example: Unique Start Times Constraint Without using ASAP and ALAP values: Using ASAP and ALAP:

29 29EE VLSI Design Automation I ILP Example: Dependency Constraints Using ASAP and ALAP, the non-trivial inequalities are: (assuming unit delay for + and *)

30 30 EE VLSI Design Automation I ILP Example: Resource Constraints Resource constraints (assuming 2 adders and 2 multipliers) Objective: –Since =4 and sink has no mobility, any feasible solution is optimum, but we can use the following anyway:

31 31EE VLSI Design Automation I ILP Formulation of MR-LCS Dual problem to ML-RCS Objective: –Goal is to optimize total resource usage vector, a. –Objective function is c T a, where entries in c are respective area costs of resources (the a k inequality constraint in ML-RCS is now an inequality with the variable a k (element of a) in the RHS. Constraints: –Same as ML-RCS constraints, plus: –Latency constraint added: [©Gupta]

32 Search Techniques A B C D E F G A B C D E F G A B C D E F G DFSBFSGraph dfs(v) /* for basic graph visit or for soln finding when nodes are partial solns */ v.mark = 1; for each (v,u) in E if (u.mark != 1) then dfs(u) Algorithm Depth_First_Search for each v in V v.mark = 0; for each v in V if v.mark = 0 then if G has partial soln nodes then dfs(v); else soln_dfs(v); soln_dfs(v) /* used when nodes are basic elts of the problem and not partial soln nodes */ v.mark = 1; If path to v is a soln, then return(1); for each (v,u) in E if (u.mark != 1) then soln_found = soln_dfs(u) if (soln_found = 1) then return(soln_found) end for; v.mark = 0; /* can visit v again to form another soln on a different path */ return(0)

33 Search Techniques—Exhaustive DFS A B C D E F G DFS optimal_soln_dfs(v) /* used when nodes are basic elts of the problem and not partial soln nodes */ begin v.mark = 1; If path to v is a soln, then begin if cost < best_cost then begin best_soln=soln; best_cost=cost; endif v.mark=0; return; Endif for each (v,u) in E if (u.mark != 1) then cost = cost + edge_cost(v,u); /* global var. */ optimal_soln_dfs(u) end for; v.mark = 0; /* can visit v again to form another soln on a different path */ end Algorithm Depth_First_Search for each v in V v.mark = 0; best_cost = infinity; cost = 0; optimal_soln_dfs(root);

34 Best-First Search BeFS (root) begin open = {root} /* open is list of gen. but not expanded nodes—partial solns */ best_soln_cost = infinity; while open != nullset do begin curr = first(open); if curr is a soln then return(curr) /* curr is an optimal soln */ else children = Expand_&_est_cost(curr); /* generate all children of curr & estimate their costs---cost(u) should be a lower bound of cost of the best soln reachable from u */ for each child in children do begin if child is a soln then delete all nodes w in open s.t. cost(w) >= cost(child); endif store child in open in increasing order of cost; endfor endwhile end /* BFS */ Expand_&_est_cost(Y) begin children = nullset; for each basic elt x of problem “reachable” from Y & can be part of current partial soln. Y do begin if x not in Y and if feasible child = Y U {x}; path_cost(child) = path_cost(Y) + cost(Y, x) /* cost(Y, x) is cost of reaching x from Y */ est(child) = lower bound cost of best soln reachable from child; cost(child) = path_cost(child) + est(child); children = children U {child}; endfor end /* Expand_&_est_cost(Y); Y = partial soln. = a path from root to current “node” (a basic elt. of the problem, e.g., a city in TSP, a vertex in V0 or V1 in min-cut partitioning). We go from each such “node” u to the next one u that is “reachable “ from u in the problem “graph” (which is part of what you have to formulate) u (1) (2) (3) costs root

35 Best-First Search Proof of optimality when cost is a LB The current set of nodes in “open” represents a complete front of generated nodes, i.e., the rest of the nodes in the search space are descendants of “open” Assuming the basic cost (cost of adding an elt in a partial soln to contruct another partial soln that is closer to the soln) is non-negative, the cost is monotonic, i.e., cost of child >= cost of parent If first node curr in “open” is a soln, then cost(curr) <= cost(w) for each w in “open” Cost of any node in the search space not in “open” and not yet generated is >= cost of its ancestor in “open” and thus >= cost(curr). Thus curr is the optimal (min- cost) soln u (1) (2) (3) costs root Y = partial soln.

36 Search techs for a TSP example A B C D E F B E F F D F E F D E D x A A C F E E A AA Exhaustive search using DFS (w/ backtrack) for finding an optimal solution Solution nodes TSP graph

37 Search techs for a TSP example (contd) A B C D E F B E F F D F E F A A C F A BeFS for finding an optimal TSP solution 22+9 CDE CED XXX F D 21+6 C F BF F A Lower-bound cost estimate: MST({unvisited cities} U {current city} U {start city}) LB as structure (spanning tree) is a superset of reqd soln structure (cycle) min(metric M’s values in set S) <= min(M’s values in subset S’) Similarly for max?? MST for node (A, E, F); = MST{F,A,B,C,D}; cost=16 Path cost for (A,E,F) = 8 Set S of all spanning trees in a graph G Set S’of all Hamiltonian paths (that visits a node exactly once)in a graph G S S’

38 BFS for 0/1 ILP Solution root (no vars exp.) X = {x1, …, xm} are 0/1 vars Choose vars Xi=0/1 as next nodes in some order (random or heuristic based) X2=0 X2=1 Solve LP w/ x2=0; Cost=cost(LP)=C1 Solve LP w/ x2=1; Cost=cost(LP)=C2 Solve LP w/ x2=1, x4=0; Cost=cost(LP)=C3 Solve LP w/ x2=1, x4=1; Cost=cost(LP)=C4 X4=0 X4=1 X5=0 X5=1 Solve LP w/ x2=1, x4=1, x5=1 Cost=cost(LP)=C6 Solve LP w/ x2=1, x4=1, x5=0 Cost=cost(LP)=C5 optimal soln Cost relations: C5 < C3 < C1 < C6 C2 < C1 C4 < C3

39 Iterative Improvement Techniques Iterative improvement Deterministic Greedy Stochastic (non-greedy) Locally/immediately greedy Non-locally greedy Make move that is immediately (locally) best Until (no further impr.) (e.g., FM) Make move that is best according to some non-immediate (non-local) metric (e.g., probability- based lookahead as in PROP) Until (no further impr.) Make a combination of deterministic greedy moves and probabilistic moves that cause a deterioration (can help to jump out of local minima) Until (stopping criteria satisfied) Stopping criteria could be an upper bound on the total # of moves or iterations







Download ppt "Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago."

Similar presentations

Ads by Google