Presentation is loading. Please wait.

Presentation is loading. Please wait.

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.

Similar presentations


Presentation on theme: "Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported."— Presentation transcript:

1 Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported in part by MARCO GSRC

2 Outline  Motivation Performance driven bipartition problem New bipartitioning algorithm Experimental results Conclusion and future work

3 Partitioning and Performance The hypergraph partitioning problem is to divide the nodes of a hypergraph into roughly equal parts; the traditional objective is to minimize cutsize. In performance-driven partitioning, we also seek to minimize path delay on timing paths.

4 – Reduces delay by 16% while increasing cutsize by 17% – Requires substantial gate replication Previous Work (I) [Cong et al. ISPD-2002] –Global clustering based algorithm with retiming Min-delay Clustering w/ retiming De-clustering and refinement Min-cutsize Clustering

5 – 14% reduction of delay with 10% increase in cutsize – 139% increase in runtime compared with hMetis Previous Work (II) [Ababei et al. ICCAD-2002] –Reweighting based method Global timing analysis Find critical paths Reweighting Input 1 1 1 1 1 2 Path based Net based Cutsize oriented partitioner, such as hMetis,MLPart

6 Motivating Questions  Can we avoid global timing analysis? –Global timing analysis is extremely time-consuming  Can we improve path delay without significant degrading of cutsize? –Need smooth tradeoff between delay and cutsize  Can we reduce implementation overheads? –Previous methods store thousands of critical paths and continuously update them

7 Outline Motivation  Performance driven bipartition problem New bipartitioning algorithm Experimental results Conclusion and future work

8 Delay Model Delay = hop_delay + node_delay Part 0 Part 1 FF nodes Combinational nodes hop cut [Cong et al. ISPD-2002] hop_delay=5 node_delay=1  Delay = 3x5 + 5x1 = 20 [Ababei et al. ICCAD-2002] hop_delay=Elmore delay node_delay=constant

9 Performance Driven Bipartition Problem Given: Hypergraph H=(V,E) Area Balance tolerance s (0<s<1), a parameter to control allowable slack in the area constraint , a given parameter which captures tradeoff between cutsize and path delay (hopcount) Find: A bipartition (V 0 |V 1 ) which satisfies: and minimizes  (cutsize)+(1-  )(Max_hopcount)

10 Outline Motivation Performance driven bipartition problem  New bipartitioning algorithm Experimental results Conclusion and future work

11 Unidirectional Partition Path delay is minimized with hopcount = 1 if the partition is unidirectional (“acyclic”), that is, all cuts are in the same direction Problem: High cutsize No unidirectional solution Can we achieve “locally unidirectional” partition? Max hopcount=5Max hopcount=3 Part 1 Part 0 Part 1 Part 0 Part 1

12 V-Shaped Nodes V-shaped node If a combinational node v satisfies: there exist v j, v t in the other part and a path from v j to v t that includes only v then v is a V-shaped node vjvj Part 1 Part 0 vtvt v

13 V-Shaped Nodes in Critical Paths Empirical observations from study of partitioning solutions: there are V-shaped nodes in the partitioning solutions every V-shaped node is included in many critical paths every critical path contains several V-shaped nodes For testcase 1: Number of nets : 16377 Number of critical paths : 26772 On average, one critical path contains 27.6 nodes On average, one critical path contains 3.4 V-nodes On average, one V-node belongs to 233.7 critical paths

14 Key Idea: V-Shaped Nodes Elimination PATH: a  b  c hopcount=2 PATH: d  b  c hopcount=1 PATH: e  b  c hopcount=1 a f c b e d Move b a f c b e d Move V-shaped node “b” to reduce path hopcount Part 0 Part 1 Part 0 PATH: a  b  c hopcount=0 PATH: d  b  c hopcount=1 PATH: e  b  c hopcount=1

15 Distance-k V-Shaped Nodes Elimination a d b Move b,c k = 2: Move V 2 node “b, c” reduce path hopcount from 2 to 0 Part 0 Part 1 c a d b Part 0 Part 1 c Problems with large k: Cutsize may be greatly increased Delay of one path reduced while other paths delay increased

16 New Gain Function v Before Move After Move v g(v): traditional FM gain r j (v): reduction of V j nodes after moving v Gain(v)= δ (0)+ δ (1)

17 Distance-k Unidirectional Algorithm Calculate initial gains for all nodes and store the gains Select the node v with maximum gain /* CLIP-like method: move the cluster that v belongs to */ Reset the gains of all nodes to zero Move v and update the gains of v and its neighbors While (  one node not moved) Select one node v with the maximum updated gain Move v and update the related gains Find the point in the move sequence at which the sum of gains is maximum; undo all moves after this point

18 Outline Motivation New bipartitioning algorithm  Experimental results Conclusion and future work

19 Experimental Setup Four industry testcases obtained as LEF/DEF Model of Ababei et al. (ICCAD-2002) used to calculate delay Partitioning solutions compared to results of MLPart –strongest multilevel netlist partitioning code –website: http://nexus6.cs.ucla.edu/GSRC/bookshelf/Slots/Partitioning/MLPart All tests on 600MHz Intel Pentium-III Xeon

20 Biasing against V 1 Nodes vs. MLPart Testcase MLPart MLPart+V-shaped nodes Removal cutsizehdelaytime(s)cutsizehdelaytime(s) 1820.75.3352.811.79856.13.3266.812.58 2169.93.5220.713.45189.82.5211.215.32 3141.33291.616.67152.32.3283.618.27 4408.75.3302.612.43421.23.6252.714.03 Reduction of delay: 4.5%-24.4% average:15.1% Increase of cutsize: 3.0%-10.0% average: 4.9% Increase of runtime: 6.3%-11.4% average: 9.7% Using the delay model in Cong et al. ISPD -2002 Reduction of delay: 4.3%-21.2% average:14.7% δ(0)=1, δ(1)=10

21 Biasing against V 2 Nodes vs. MLPart Testcase MLPartMLPart+V k=2 nodes Removal cutsizehdelaytime(s)cutsizehdelaytime(s) 1820.75.3352.811.79847.53262.113.16 2169.93.5220.713.45183.22202.515.67 3141.33291.616.67149.22275.618.92 4408.75.3302.612.43416.73.4243.514.79 δ(0)=1, δ(1)=30, δ(2)=3 Reduction of delay: 8.9%-30.0% average: 18.7% Increase of cutsize: 3.1%-7.2% average: 3.5% Increase of runtime: 11.9%-15.9% average: 13.1% Using the delay model in Cong et al. ISPD -2002 Reduction of delay: 8.3%-28.7% average: 17.3%

22 Outline Motivation Performance driven bipartition problem New bipartitioning algorithm Experimental results  Conclusions and future work

23 Conclusions Simple yet efficient timing-driven partitioning that does not require global timing analysis Negligible implementation, runtime overhead Significantly reduces path delay with cutsize and runtime almost same as leading-edge MLPart Similar improvements observed with different path delay metrics Futures –Impact of new partitioner on placement –Efficient methods for biasing δ(k) k>2

24 Thank you!

25 Future Work Impact of new partitioner on placement Efficient methods for biasing δ(k) k>2

26 Why Performance Driven Partitioning? Achieving timing closure becomes increasingly difficult in deep-submicron technologies due to non-ideal scaling of interconnect delay Routing alone can no longer solve timing problem, even with aggressive optimizations (buffer insertion, buffer/wire sizing,…)  Timing needs to be addressed at all design stages Partitioning is a critical step in defining interconnect timing properties, but is traditionally driven by cutsize objective

27 Previous Work (I) With Logic Replication –Retiming –Replication graph Without Logic Replication –Net based reweighting –Path based reweighting

28 FM Partitioning and Gain Function v Before Move v After Move Gain(v) = Reduction of cutsize after moving v Gain(v)=-1 Move the node with the max gain and lock it Start with random partition Keep moving until all nodes are locked Find the best point in the move sequence Part 0 Part 1 Part 0 Part 1 Part 0 Part 1 Part 0 Part 1

29 Procedure to Calculate r j (v) Delete all FF nodes and their related edges In the remaining graph, BFS from v For each level j from 1 to k If v is a V j node before moving, r j ’=1 If v is a V j node after moving, r j ’’=1 r j =r j ’’-r j ’

30 CLIP Algorithm v CLIP v Reminiscent of CLIP (Deng et al. DAC 1996) in how it induces movement of clusters across the cutline.

31 Distance-k V-Shaped Nodes Distance-k V-shaped nodes (V k -node): If k combinational nodes v i,1 … v i,k satisfy: v i,1 … v i,k are in the same part  v j, v t in the other part  a path from v j to v t and only passes v i,1 … v i,k then v i,1 … v i,k are distance-k V-shaped nodes vjvj Part 1 Part 0 vtvt v i,1 v i,k

32 Notation H(V,E)= circuit hypergraph V = set of nodes representing components of the circuit E = set of signal nets A bipartition (V 0 |V 1 ) of H(V,E) divides V into two disjoint subsets s.t. V= V 0  V 1, which are called Part 0 and Part 1 A = the total area of all the nodes in V A 0 = the area of all the nodes in V 0


Download ppt "Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported."

Similar presentations


Ads by Google