Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006.

Similar presentations


Presentation on theme: "Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006."— Presentation transcript:

1 Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006

2 UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs u Chapter 8. Conclusions and future works

3 UCLA VLSICAD LAB Publication List u Cong. J, Xie M., and Zhang Y. “An Enhanced Multilevel Routing System,” Proceedings of the ICCAD, pp. 51-58, 2002. u Chang C., Cong J. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” Proceedings of ASPDAC, pp. 621-627, 2003. u Cong J., Romesis M. and Xie M., “Optimality, Scalability and Stability Study of Existing Partitioning and Placement Algorithms,” Proceedings of ISPD, pp. 88-94, 2003. u Cong J., Romesis M. and Xie M., “Optimality and Stability Study of Timing-driven Placement Algorithms,” Proceedings of ICCAD, pp. 472-478, 2003. u Cong J., Kong T., Shinnerl J. Xie M. and Yuan X. “Large-Scale Circuit Placement: Gap and Promise,” Proceedings of ICCAD, pp. 883-890, 2003. u Chang C., Cong J. Romesis M. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” IEEE TCAD, vol. 23, no. 4, pp. 537-549, 2004.

4 UCLA VLSICAD LAB Publication List u Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” Proceedings of ICCAD, pp. 883-890, 2004. u J. Cong, J. Fang, M. Xie, and Y. Zhang, IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. u J. Cong, J. Fang, M. Xie, and Y. Zhang, "MARS - A Multilevel Full-Chip Gridless Routing System," IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. u J. Cong, T. Kong, J. Shinnerl, M. Xie, and X. Yuan, "Large Scale Circuit Placement," ACM TODAES, Vol. 10, No. 2, pp. 389-430, April 2005. u Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” IEEE TCAD, to appear. u T. Chan, J. Cong M. Romesis J. Shinnerl, K. Sze, M. Xie, “mPL6: A Robust Multilevel Mixed-size Placement Engine,” Proceedings of ISPD, pp. 227-229, April 2005. u Cong J. and Xie M., “A Robust Detailed Placement Algorithm for Mixe-size IC Designs”, Proceedings of ASPDAC, pp.188-194., 2006. u J. Cong, T. Chan, J. Shinnerl, K. Sze and M. Xie, "mPL6: Enhanced Multilevel Mixed-size Placement," Proceedings of the ISPD, pp. 212-214, April 2006.,

5 UCLA VLSICAD LAB Relative Wirelength mPL 1.0 [ICCAD00] Recursive ESC clustering NLP at coarsest level Goto discrete relaxation Slot Assignment legalization Domino detailed placement year 20002001 20022003 2004 mPL 1.1 FC-Clustering added partitioning to legalization mPL 2.0 RDFL relaxation primal-dual netlist pruning mPL 3.0 [ICCAD 03] QRS relaxation AMG interpolation multiple V-cycles cell-area fragmentation UNIFORM CELL SIZE NON-UNIFORM CELL SIZE mPL 4.0 improved DP better coarsening backtracking V-cycle mPL5,mPL6 Multilevel Force-Directed A Brief History of mPL

6 UCLA VLSICAD LAB Multiscale Optimization Framework Interpolation & Relaxation (optimization) Coarsening (Clustering) Problem size decreases Explores different scales of the solution space at different levels Supports VERY FAST and SCALABLE methods Supports inclusion of complicated objectives and constraints Successful across MANY DIVERSE applications Given problem

7 UCLA VLSICAD LAB mPL6 – Generalized Force Directed Refinement u Logsum wirelength u Average bin density u Equality constraint  Average bin density = utilization ratio 1 1 3 2 43 2 v6v6 v5v5 v4v4 v3v3 v2v2 v1v1 v7v7 = a 13 (v 7 ) = fractional area of cell v 7 in bin B 13

8 UCLA VLSICAD LAB mPL6 – Iterative Flow Level 3 Level 2 Level 1 C C I I C+I I I u Bestchoice clustering [Alpert et al, ISPD05] u AMG declustering [Chen et al, DAC03, Chan et al ICCAD03] u Multiple V cycle with distance based reclustering [Chan et al, ICCAD03]

9 UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation  Motivation and previous work  Routability-driven multilevel placement  Experiment results  Conclusions and future work u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs u Chapter 8. Conclusions and future works

10 UCLA VLSICAD LAB Motivation u mPL does not consider routing congestion  Aggressive HPWL minimization != routability u Routability-driven placement  Routability modeling  Routability optimization

11 UCLA VLSICAD LAB Previous Work -- Routability Modeling u Topology-free methods  Dragon [Yang et al., TCAD03]  Sparse [Hu et al., ICCAD02]  BonnPlace [Brenner & Rohe, ISPD02] u Topology-based methods  [Mayrhofer & Lauther, ICCAD90]  mPG [Chang et al., ISPD02]

12 UCLA VLSICAD LAB Previous Work -- Routability Optimization u Cell weighting  Cell inflation based on congestion  Constructive and iterative methods Dragon [Yang et al, TCAD03] Dragon [Yang et al, TCAD03] BonnPlace [Brenner & Rohe, ISPD02] BonnPlace [Brenner & Rohe, ISPD02] u Net weighting  Translate into bin weights and optimize weighted wirelength  Iterative methods Sparse [Hu & Sadowska, ICCAD02] Sparse [Hu & Sadowska, ICCAD02] mPG [Chang et al, ISPD02] mPG [Chang et al, ISPD02]

13 UCLA VLSICAD LAB Routability-Driven Multilevel Placement u Global placement  Congestion estimation by a fast LZ router  Congestion-driven cell re-placement based on weighted wirelength u Hierarchical top-down white space allocation  Geometric-based slicing tree  Congestion estimation on tree  Cutline adjustment

14 UCLA VLSICAD LAB mPL-R Congestion Estimation with LZ Router  Use LZ-Router [Chang et al., ISPD02] for fast congestion analysis on each level  Binary search on V-stem (or H- stem)  Initialize left region and right region to cover bounding box  Repeat Query wire usage on both regions Select region with less congestion Left region Right region HVH VHV Less congested More congested

15 UCLA VLSICAD LAB mPL-R Congestion-Driven Re-Placement  Pick cells whose incident nets cross congested regions to move  Start from the optimal location for HPWL  Search adjacent bins within certain window 0.5 1.2 2.0 WLc = 15.5  Choose the bin based on weighted WL WLc = 9.2

16 UCLA VLSICAD LAB White Space Allocation -- Slicing Tree Construction root AB C D EF GH A B C D E F G H  Recursively bipartition chip region from top to bottom.  Estimate congestion on leaf nodes. Congestion on other nodes can be computed from bottom to top. Cut direction Cut location Node area Congestion  Group cells into children nodes according to location relative to cutline.

17 UCLA VLSICAD LAB AB C D EF GH A B C D E F G H White Space Allocation – Cutline Adjustment  Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their overflow. root 240/88 116/28124/60 cell area/congestion Assuming chip area of root = 300 Total WS area = 300 – 240 = 60 WS area for left child = 60*28/(28+60) = 19.1 WS area for right child= 40.9 Chip area for left child = 116+19.1 = 135.1 Chip area for right child = 124+40.9 = 164.9 AB C D EF GH

18 UCLA VLSICAD LAB AB C D EF GH A B C D E F G H White Space Allocation – Cutline Adjustment  Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. root 240/88 116/28124/60 54/9 62/1958/34 66/26 cell area/congestion

19 UCLA VLSICAD LAB AB C D EF GH A B C D E F G H White Space Allocation – Cutline Adjustment  Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. root 240/88 116/28124/60 cell area/congestion

20 UCLA VLSICAD LAB Experiment Setup u 16 IBM version 2 examples  5% to 15% white space u Three state-of-the-art routability-driven placers  Dragon-fd 3.01 [Yang et al, TCAD03] Simulated annealing with bin swapping Simulated annealing with bin swapping Two-step white space allocation Two-step white space allocation  Capo 10.0 [Roy et al, ISPD06] Fast steiner tree approximation Fast steiner tree approximation Congestion based cutline shifting Congestion based cutline shifting  Fengshui 5.1 [Agnihotri et al, ISPD05] Recursive bi-section Recursive bi-section Similar white space allocation method incorporated Similar white space allocation method incorporated u Magma router for evaluation

21 UCLA VLSICAD LAB Routability-Driven Placement Tools Comparison mPL-R+WSA is the only flow to produce all successful routing mPL-R+WSA produces the shortest wirelength

22 UCLA VLSICAD LAB Routability Optimization Techniques Comparison u mPL  Latest pure WL-driven version  No consideration of routing congestion u mPL-R u mPL-I  Cell inflation + dummy density assignment  Highest quality in ISPD06 contest [Nam ISPD06]  Density target set as utilization u mPL+WSA u mPL-R+WSA

23 UCLA VLSICAD LAB Routability Optimization Techniques Comparison mPL-I with heuristic penalty term does not perform very well Both mPL-R and WSA improves routability significantly Combined workflow gives the highest completion rate

24 UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization  Enhancement for macro legalization algorithm  Additional experiment results u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs u Chapter 8. Conclusions and future works

25 UCLA VLSICAD LAB Enhancement for Macro Legalization u Constraint graph reduction  Original constraint graph One edge for each pair of macros One edge for each pair of macros O(n 2 ) in total O(n 2 ) in total  Reduced constraint graph Edge inserted only when no transitive closure present Edge inserted only when no transitive closure present Significant reduction of memory consumption Significant reduction of memory consumption ? A B C

26 UCLA VLSICAD LAB Experiment Result with ICCAD04-MS u 84% reduction of constraint edges u No degradation of solution quality

27 UCLA VLSICAD LAB Enhancement for Macro Legalization f ij x H ij u Used in ISPD 2006 placement contest

28 UCLA VLSICAD LAB ISPD05 Examples u Bigger problem size u Suitable to test scalability

29 UCLA VLSICAD LAB Scalability Comparison on ISPD05 -- Global Placements by APlace u XDP produces 1% longer WL, but is 10X faster

30 UCLA VLSICAD LAB Scalability Comparison on ISPD05 -- Global Placements by mPL u XDP can be 10x faster with comparable quality

31 UCLA VLSICAD LAB Impact of Gradual Macro Legalization – ISPD05 u 12 % WL reduction possible with macros movable

32 UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs  Motivation and previous works  Multilevel heterogeneous placement – mPL-H  Experiment results  Conclusions and future work u Chapter 8. Conclusions and future works

33 UCLA VLSICAD LAB Motivation u Popularity of FPGAs  Ease of use  Low cost for small to medium production u Modern FPGA placement impose heterogeneous constraints  Memory block of different capacity, DSP blocks  Each block should only be placed on sites of the same type

34 UCLA VLSICAD LAB Example FPGA Chip Figure taken from Altera Stratix Handbook

35 UCLA VLSICAD LAB Previous Works -- Academia u Simulated annealing  VPR [Betz & Rose, FPL97, Marquardt et al, FPGA00]  PATH [Kong, ICCAD02]  SPCD [Chen & Cong, FPL04, FPGA05] u Partitioning  PPFF [Maidee et al, DAC03] u Graph embedding  CAPRI [  CAPRI [Gopalakrishnan et al, DAC06 ] u Multilevel  Ultrafast-VPR [Sankar & Rose, FPGA99]  mPG-ms [Cong & Yuan, ASPDAC03] u None of them handle heterogeneous constraint

36 UCLA VLSICAD LAB Previous Works -- Industry u Quartus II by Altera Corporation  Stratix, Stratix II, etc. u ISE by Xilinx Corporation  Virtex II, Virtex II Pro, etc. u Do have heterogeneous capability  Only for proprietary chip architecture  Algorithms and techniques not publicly documented

37 UCLA VLSICAD LAB Multilevel Heterogeneous Placement – mPL-H u Based on multilevel generalized force directed placement u Multi-layered placement to handle heterogeneous placement u Filler cells to enhance quality and stability u Gradual carry chain legalization

38 UCLA VLSICAD LAB Limitations of mPL for Heterogeneous Placement u Does not consider heterogeneous constraints  Any block can be placed anywhere u Requires density to be uniform everywhere  Penalize wirelength for low utilization

39 UCLA VLSICAD LAB mPL-H -- Global Placement (I) u Multiple layers, each layer for each resource  DSP layer  M-RAM layer  LAB layer  M4K layer  M512 layer u Forbidden regions blocked by obstacles u Uniform wirelength computation DSP M-RAM LAB

40 UCLA VLSICAD LAB mPL-H -- Global Placement (II) u Filler cell  Occupy the residual capacity  Transform inequality into equality  Density computed independently on each layer  Granularity may not be fine enough

41 UCLA VLSICAD LAB mPL-H -- Legalization (I) u DSP and memory blocks  Domains do not overlap Legalized independently Legalized independently  Uniform size for the same type Linear assignment O(n 3 ) Linear assignment O(n 3 ) Cost as distance Cost as distance cells sites

42 UCLA VLSICAD LAB mPL-H -- Legalization (II) u Carry chains  Vary in length  Legalized in descending order of length  Partition each column into same size  Assign chains of same length using linear assignment

43 UCLA VLSICAD LAB mPL-H -- Legalization (III) u Column-wise rearrangement of carry chains  P(n,m) is the minimum perturbation of assign (v 1,…v) to sites (s 1,s 2,…s m )  P(n,m) is the minimum perturbation of assign (v 1,…v n ) to sites (s 1,s 2,…s m )  P(1,j) = d(1,j), d(1,j) is the perturbation of assigning v 1 to site s j  P(i,j) = min{P(i-1,j-h i ), P(i, j-1)}  Can be solved more efficiently for some special cases Quadratic distance Quadratic distance No site constraint No site constraint

44 UCLA VLSICAD LAB Experiment Setting Quartus_map Verilog netlist Quartus_fittermPL-H Clustered.vqm netlist Quartus_router Chip type Architecture Description XML.qsf placement

45 UCLA VLSICAD LAB QUIP Suite

46 UCLA VLSICAD LAB Wirelength Comparison mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0

47 UCLA VLSICAD LAB Runtime Comparison mPL-H can be 2X faster than Quartus II v5.0 when the circuit becomes sufficiently large

48 UCLA VLSICAD LAB Optimality Study of mPL-H u PEKO-H construction  Populate all sites with corresponding resource type  Generate each net with optimal wirelength  Extract the netlist in the end

49 UCLA VLSICAD LAB Experiment Results with PEKO-H mPL-H produces HPWL 34% longer than the optima

50 UCLA VLSICAD LAB Displacement of PEKO-H13

51 UCLA VLSICAD LAB Displacement of PEKO-H16 Swirls are difficult for local refinement to recover

52 UCLA VLSICAD LAB Conclusions u First analytical work for heterogeneous placement u Compared to leading edge Quartus II v5.0 for Stratix  3 % shorter HPWL, 2 % shorter routed WL  Can be 2X faster when example becomes sufficiently large u Optimality study with PEKO-H  Displacement observed from the optima  34% longer HPWL than the optima

53 UCLA VLSICAD LAB Future Work u Accurate timing analysis  Only point-to-point delay table released OK for overlap-free intermediate results OK for overlap-free intermediate results Not accurate enough for analytical placer Not accurate enough for analytical placer  Guide timing-driven placement u Routing congestion  Proprietary routing resource information not publicly available

54 The End Thank You!


Download ppt "Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006."

Similar presentations


Ads by Google