Download presentation
Presentation is loading. Please wait.
Published byErik Caldwell Modified over 8 years ago
1
Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California Los Angeles cong@cs.ucla.edu Chang Wu Aplus Design Technologies Los Angeles changwu@aplus-dt.com
2
Problem Definition Problem: k-way circuit partitioning and retiming with balanced area for delay minimization Delay minimization with consideration of cutsize Retiming is performed simultaneously with partitioning for best possible delay reduction Generic delay model: node delay, intra-block delay, inter-block delay D d Node delay d v Inter-block delay D Intra-block delay d D > d
3
Existing Approaches Clustering-based approaches PRIME: group nodes into clusters with given area bound Quasi-optimal delay solution with node duplication Huge cutsize (3X) Partitioning-based approaches Partition circuits into k-blocks and then iteratively move nodes to further improve Cut-size minimization: hMetis Multi-level partitioning, very fast, excellent cutsize, fair circuit delay Delay minimization: HPM Performance-driven clustering + cutsize-driven partitioning, tradeoff between delay and cutsize
4
Existing Approaches (cont) Clustering-based approaches Delay optimization with node duplication is optimally solved Node duplication-free clustering is NP-complete, but with fairly good results by resolving duplications heuristically Huge cutsize Partitioning-based approaches Very good cutsize Difficulty on delay minimization: delay update for each node- move is too costly (linear time) hMetis: does not consider delay directly, gradual coarsening is difficult to target for delay HPM: separate clustering and partitioning, clustering does not know its impact on cutsize, partitioning does not have much control on delay
5
HPM: Combination of Clustering and Partitioning HPM by Cong, et al, [DAC99] Clustering followed by partitioning Good delay and cutsize balance Clustering and partitioning are two completely separated steps Clustering with very small and fixed area bound (10) on each blocks: much less than A/K, where A is circuit area Achieve inferior delay to clustering with cluster area bound of A/K (delay is ~23% larger) Achieve larger cutsize than hMetis because clustering constraints reduces cutsize reduction capability of partitioning Better solution is Needed
6
Multi-Level Partitioning for Cutsize hMetis by Karypis, et al. [DAC97] Gradual coarsening to group tightly connected nodes together Uncoarsening gradually and reducing cutsize by moving clusters Fast algorithm: reduced solution space at each level as many nodes are grouped and moved together Smaller cutsize: more thorough search is possible in reduced solution space Hyperedge-based coarsening is very suitable for cutsize Delay is completely ignored
7
Existing Multi-level Optimization Engine V-shape multi-level optimization used in hMetis Not very suitable for delay minimization Gradual coarsening has difficulty to predict impact on delay
8
MLPR: Performance-Driven Multi- Level Partitioning and Retiming K-way partitioning algorithm for performance optimization Retiming is performed during partitioning for best possible circuit delay Cutsize reduction is also considered MLPR Clustering with area bound of A/K, where A is circuit area Partitioning of clusters into K blocks For level from 1 to log(A/K) Clustering with area bound of A/(K 2 level ) –Each cluster is bounded by the block it belongs to Moving clusters to reduce cutsize while preserving circuit delay Final movement of individual nodes for best solution
9
Our Contribution: Global Clustering Based Multi-Level Optimization Engine Start directly from the coarsest level with global clustering for best possible delay Clustering-based gradual declustering to increase the freedom for refinement Retiming is considered simultaneously during clustering and partitioning for smaller delay
10
Global Clustering for Delay Minimization Clustering: to group nodes into clusters with area no more than a given bound CLUS by Pan, et al. [TCAD98] PRIME by Cong, et al [DAC99] Quasi-optimal clustering with retiming for delay minimization By setting area-bound to be A/K, clustering can compute a partitioning solution with quasi-optimal delay Existing coarsening algorithms considering local node connectivity cannot predict circuit delay Theorem: Let c be the circuit delay of a clustering solution. For any partitioning solution P on the clusters, its delay is less than or equal to c Clustering can compute an upper-bound on circuit delay after partitioning
11
Global Clustering-Based Optimization Engine Start from the coarsest level with clustering to define a good circuit delay Comparison: coarsening with gradually increased cluster size has difficulty to predict circuit delay after partitioning on clusters Clustering with gradually reduced area bound to decluster at each level Nodes on a critical path will be grouped together and will NOT be partitioned into different partitions Avoid delay increase by partitioning refinement as much as possible Partition-bounded clustering to guarantee consistent solution improvement and algorithm convergency Guarantee a better solution in a finer level than a coarser level
12
Partitioning with Retiming Retiming is considered during clustering and partitioning at each level for best possible circuit delay Sequential arrival time: a v = l(e), where l(e)=d v +d e - w e for a given target clock period , where d v is node delay of v, d e is edge delay, w e is the number of FFs on edge e from u to v. Theorem [Pan98]: if max(a po ) , minimum circuit delay after retiming is no more than + D. Timing analysis in both clustering and partitioning is based on sequential arrival time Binary search to get the minimum clock period after retiming
13
Test Results 16-way partitioningBi-partitioning 16x 120
14
Conclusion Global clustering is more suitable for delay minimization Global clustering-based multi-level optimization engine achieves good delay and cutsize Retiming further helps delay reduction Simultaneously retiming with partitioning achieves better results than separate partitioning with retiming Not a necessity to the main algorithm, can be disabled
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.