Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partitioning and Clustering Professor Lei He

Similar presentations


Presentation on theme: "Partitioning and Clustering Professor Lei He"— Presentation transcript:

1 Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu http://eda.ee.ucla.edu/

2 Outline  Circuit Partitioning formulation  Importance of Circuit Partitioning  Partitioning Algorithms  Circuit Clustering Formulation  Clustering Algorithms

3 Partitioning Formulation Bi-partitioning formulation : Minimize interconnections between partitions  Minimum cut: min c(x, x ’ )  minimum bisection: min c(x, x’) with |x|= |x ’ |  minimum ratio-cut: min c(x, x ’ ) / |x||x ’ | XX’ c(X,X’)

4 A Bi-Partitioning Example Min-cut size=13 Min-Bisection size = 300 Min-ratio-cut size= 19 a b ce df mini-ratio-cutmin-bisection min-cut 9 10 100 4 Ratio-cut helps to identify natural clusters

5 Circuit Partitioning Formulation (Cont’d) General multi-way partitioning formulation: Partitioning a network N into N 1, N 2, …, N k such that  Each partition has an area constraint  each partition has an I/O constraint Minimize the total interconnection:    i Nv i Ava)( iii INNNc  ),( ),( i N i NNNc i  

6 Importance of Circuit Partitioning  Divide-and- conquer methodology  The most effective way to solve problems of high complexity  E.g.: min-cut based placement, partitioning-based test generation,…  System-level partitioning for multi-chip designs or 3D  inter-chip interconnection delay dominates system performance  inter-layer wire pitch is much larger  Circuit emulation/parallel simulation  partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special-purpose processors (e.g. Zycad).  Parallel CAD development  Task decomposition and load

7 Partitioning Algorithms  Iterative partitioning algorithms  Multi-way partitioning  Multi-level partitioning (to be discussed after clustering)

8 Iterative Partitioning Algorithms  Greedy Iterative improvement method  [Kernighan-Lin 1970]  [Fiduccia-Mattheyses 1982]  [krishnamurthy 1984]  Simulated Annealing  [Kirkpartrick-Gelatt-Vecchi 1983]  [Greene-Supowit 1984]  (SA will be formally introduced in the Floorplan chapter)

9 Kernighan-Lin’s Algorithm  Pair-wise exchange of nodes to reduce cut size  Allow cut size to increase temporarily within a pass Compute the gain of a swap Repeat Perform a feasible swap of max gain Mark swapped nodes “locked”; Update swap gains; Until no feasible swap; Find max prefix partial sum in gain sequence g 1, g 2, …, g m Make corresponding swaps permanent.  Start another pass if current pass reduces the cut size  (usually converge after a few passes) u  v  u  locked

10 Fiduccia-Mattheyses’ Improvement  Each pass in KL-algorithm takes O(n 3 ) or O(n 2 logn) time (n: #modules)  Choosing swap with max gain and updating swap gains take O(n 2 ) time  FM-algorithm takes O(p) time per pass( p: #pins)  Key ideas in FM-algorithms ! Each move affects only a few moves  constant time gain updating per move(amortized) ! Maintain a list of gain buckets  constant time selection of the move with max gain  Further improvement by Krishnamurthy  Look-ahead in gain computation u1u1 V1V1 V2V2 u2u2 g max -g max

11 Simulated Annealing Local Search cost function solution space o o o o o o o o ?

12 Statistical Mechanics vs Combinational Optimization State { r: } (configuration - a set of atomic position) Weight -Boltzmann distribution E({r:}) energy of configuration K B : Boltzmann constant; T: temperature. Low Temperature Limit?? TKrE b e /:})({ 

13 Analogy Physical System State(configuration) Energy Ground State Rapid Quenching Careful Annealing Optimization Problem (Solution) Cost function Optimal solution Iteration Improvement Simulated Annealing

14 Generic Simulated Annealing Algorithm 1. Get an initial solution S 2. Get an initial temperature T>0 3. While not yet “frozen” do the following: 3.1 For 1  i  L, do the following: 3.1.1 Pick a random neighbor S’ of S. 3.1.2 Let  cost( s’ )-cost(s) 3.1.3 If   ( 0 ) (downhill move), Set S=S’ 3.1.4 If  0 (uphill move) set S=S’ with probability 3.2 Set T= rT (reduce temperature) 4. Return S T e / 

15 Basic Ingredients for S.A.  Solution space  Neighborhood Structure  Cost Function  Annealing Schedule

16 SA Partitioning “ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.  Solution space=set of all partitions  Neighborhood Structure abcabc defdef abab defdef afaf bcdebcde abcabc a solution defdef bcdebcde acac a move Randomly move one cell to the other side

17 SA Partitioning  Cost function: f=C+ B  C is the partitioning cost as used before  B is a measure of how balance the partitioning is  is a constant. Example of B: ab...ab... cd...cd... S2S2 S1S1 B = ( |S 1 | - |S 2 | ) 2

18 SA Partitioning  Annealing schedule:  T n =(T 1 /T 0 ) n T 0 Ratio T 1 /T 0 =0.9  At each temperature, either 1. There are 10 accepted moves on the average; or 2. # of attempts  100  total # of cells  The system is “frozen” if very low acceptances at 3 consecutive temp.

19 Graph Partition Using Simulated Annealing Without Rejections  Greene and Supowit, ICCD-88 pp. 658-663  Motivation: At low temperature, most moves are rejected! e.g. 1/100 acceptance rate for 1,000 vertices

20  Key Idea (I) Biased selection If a move i has probability  i to be accepted, generate move i with probability N: size of neighborhood In general, In conventional model, each move has probability 1/N to be generated. (II) If a move is generated, it is always be accepted Graph Partition Using Simulated Annealing Without Rejections (Cont’d)   N J j i 1   }.,1min{ /T i i e  

21 Graph Partition Using Simulated Annealing Without Rejections (Cont’d)  Main Difficulty ( 1 )  i is dynamic ( since  i is dynamic ) It is too expensive to update  i ’s (  i ’s) after every move ( 2 ) Weighted selection problem how to select move i with probability   N j ji 1  ??

22 Solution to the Weight Selection Problem (general solution to the several problems)  1 +  2  1 + +  7 77  5 +  6 +  7  5 +  6  3 +  4  1 + +  4 11 22 33 44 55 66 77 0

23 Solution to the Weight Selection Problem (Cont’d) Let W=  1 +  2 +  3 +  4 +  5 +  6 + +  n, how to select i with probability  i /W ? Equivalent to choosing x such that  1 + +  i-1 < x   i + +  n v  root x  random( 0, 1 )*  (v) while v is not a leaf do if x <  (left (v)) then v  left(v) else x  x-  (left(v)), v  right (v) end Probability of ending up at leaf:        1 11 1 (Prob i j i j jj i N j ji x W   )

24 Application to Partitioning Special solution to the first problem Given a partition (A, B) Cost F(A,B)=F c (A,B)+F I (A,B) F c (A,B) = net-cut between A,B F I (A,B) = C(|A| 2 +|B| 2 ) (min when |A|=|B|=n/2) for move i,  i=F(A’,B’)-F(A,B) After a move ),()','( ),()','( BAFBAF BAFBAF cII i ccc i   changes All changes. few a I i C i  

25 Solution: Two-step biased selection: (i) choose A or B based on (ii) choose move i within A or B based Note, ’s are the same for each in A or B. So we keep one copy of for A one copy of for B choose the moves within A or B using the tree algorithm Application to Partitioning Special solution to the first problem(Cont’d) ) ( -> T I i I i  ) ( T C i C i  ) ( T I i  ) ( T C i  Pi=Pi= I i  I i  I i 

26 More Partitioning Techniques  Spectral based partitioning algorithms  [ Hagen-Kahng 1991] [Cong-Hagen-Kahng 1992]  Module replication in circuit partitioning  [ Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]  Generating uni-directional partitioning  [Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]  Logic restructuring during partitioning  [Iman-Pedram-Fabian-Cong 1993]  Communication based partitioning  [Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]

27 Multi-Way Partitioning  Recursive bi-partitioning [Kernighan-Lin 1970]  Generalization of Fuduccia-Mattheyse’s and Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]  Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993]  generalized ratio-cut value=sum of flux of each partition  generalized ratio-cut cost of a k-way partition   sum of the k smallest eigenvalue of the Laplacian Matrix

28 Circuit Clustering Formulation  Motivation:  Reduced the size of flat netlists  Identify natural circuit hierarchy  Objectives:  Maximize the connectivity of each cluster  Minimize the size, delay (or simply depth), density of clustered circuits

29 Lawler’s Labeling Algorithm [Lawler-Levitt-Turner 1969]  Assumption: Cluster size  K; Intra-cluster delay = 0; Inter-cluster delay =1  Objective: Find a clustering of minimum delay  Algorithm:  Phase 1: Label all nodes in topological order For each PI node V, L(v)= 0; For each non-PI node v p=Maximum label of predecessors of v Xp = set of predecessors of v with label p if |Xp|<K then L(v) = p else L(v) =P+1  Phase2: Form clusters  Start from PO to generate necessary clusters Nodes with the same label form a cluster p-1 Xp p-1 v p p

30 Lawler’s Labeling Algorithm (Cont’d)  Performance of the algorithm  Efficient run-time  Minimum delay clustering solution  Allow node duplication  No attempt to minimize the number of clusters  Extension to allow arbitrary gate delays  Heuristic solution [Murgai-Brayton-Sangiovanni 1991]  Optimal solution [Rajaraman-Wong 1993]

31 Maximum Fanout Free Cone (MFFC)  Definition: for a node v in a combinational circuit,  cone of v ( ) : v and all of its predecessors such that any path connecting a node in and v lies entirely in  fanout free cone at v ( ) : cone of v such that for any node  maximum FFC at v ( ) : FFC of v such that for any non-PI node w,

32 Properties of MFFCs  If  Two MFFCs are either disjoint or one contains another [CoDi93]

33 Maximum Fanout Free Subgraph (MFFS)  Definition : for a node v in a sequential circuit,  Illustration MFFCs ??? MFFS

34 MFFS Construction Algorithm  For Single MFFS at Node v  select root node v and cut all its fanout edges  mark all nodes reachable backwards from all POs  MFFSv = {unmarked nodes}  complexity : O(|N| + |E|) v

35 MFFS Construction Algorithm  For Single MFFS at Node v  select root node v and cut all its fanout edges  mark all nodes reachable backwards from all POs  MFFSv = {unmarked nodes}  complexity : O(|N| + |E|) v

36 MFFS Construction Algorithm  For Single MFFS at Node v  select root node v and cut all its fanout edges  mark all nodes reachable backwards from all POs  MFFSv = {unmarked nodes}  complexity : O(|N| + |E|) v

37 MFFS Construction Algorithm  For Single MFFS at Node v  select root node v and cut all its fanout edges  mark all nodes reachable backwards from all POs  MFFSv = {unmarked nodes}  complexity : O(|N| + |E|) v

38 MFFS Clustering Algorithm  Clusters Entire Netlist  construct MFFS at a PO and remove it from netlist  include its inputs as new POs  repeat until all nodes are clustered  complexity : O(|N| · (|N| + |E|)) v

39 MFFS Clustering Algorithm  Clusters Entire Netlist  construct MFFS at a PO and remove it from netlist  include its inputs as new POs  repeat until all nodes are clustered  complexity : O(|N| · (|N| + |E|))

40 MFFS Clustering Algorithm  Clusters Entire Netlist  construct MFFS at a PO and remove it from netlist  include its inputs as new POs  repeat until all nodes are clustered  complexity : O(|N| · (|N| + |E|)) v

41 MFFS Clustering Algorithm  Clusters Entire Netlist  construct MFFS at a PO and remove it from netlist  include its inputs as new POs  repeat until all nodes are clustered  complexity : O(|N| · (|N| + |E|))

42 Summary  Partitioning is key for applying divide-and- conquer methodology (for complexity management)  Partitioning also defines global/local interconnects and greatly impact circuit performance  Growing importance of interconnect design has introduced many new partitioning formulations  clustering is effective in reducing circuit size and identifying natural circuit hierarchy  Multi-level circuit clustering + iterative improvement based methods produce the best partitioning results


Download ppt "Partitioning and Clustering Professor Lei He"

Similar presentations


Ads by Google