APLACE: A General and Extensible Large-Scale Placer

APLACE: A General and Extensible Large-Scale Placer
Andrew B. Kahng* Sherief Reda Qinke Wang VLSI CAD Lab UCSD CSE and ECE Departments *Currently on leave of absence at Blaze DFM, Inc.

Goals and Plan Goals: Plan and Schedule:
Build a new placer to win the competition Scalable, robust, high-quality implementation Leave no stone unturned / QOR on the table Plan and Schedule: Work within most promising framework: APlace 30 days for coding + 30 days for tuning

Philosophy Respect the competition Work smart
Well-funded groups with decades of experience ABKGroup’s Capo, MLPart, APlace = all unfunded side projects No placement-related industry interactions QOR target: % better than Capo v9r6 on all known benchmarks Nearly pulled out 10 days before competition Work smart Solve scalability and speed basics first Slimmed-down data structure, -msse compiler options, etc. Ordered list of ~15 QOR ideas to implement Daily regressions on all known benchmarks Synthetic testcases to predict bb3, bb4, etc.

Implementation Framework
New APlace Flow APlace weaknesses: Weak clustering Poor legalization / detailed placement Clustering Adaptive APlace engine Global Phase Unclustering New APlace: New clustering Adaptive parameter setting for scalability New legalization + iterative detailed placement Legalization WS arrangement Detailed Phase Cell order polishing Global moving

Clustering/Unclustering
A multi-level paradigm with clustering ratio  10 Top-level clusters  2000 Similar in spirit to [HuM04] and [AlpertKNRV05] Algorithm Sketch For each clustering level: Calculate the clustering score of each node to its neighbors based on the number of connections Sort all scores and process nodes in order as long as cluster size upper bounds are not violated If a node’s score needs updating then update score and insert in order

Adaptive Tuning / Legalization
Adaptive Parameterization: Automatically decide the initial weight for the wirelength objective according to the gradients Decrease wirelength weight based on the current placement process Legalization: Sort all cells from left to right: move each cell in order (or a group of cells) to the closest legal position(s) Sort all cells from right to left: move each cell in order (or a group of cells) to the closest legal position(s) Pick the best of (1) and (2)

Detailed Placement Whitespace Compaction: Cell Order Polishing:
For each layout row: Optimally arrange whitespace to minimize wirelength while maintaining relative cell order. [KahngTZ99], [KahngRM04]. Cell Order Polishing: For a window of neighboring cells Optimally arrange cell orders and whitespace to minimize wirelength Global Moving: Optimally move a cell to a better available position to minimize wirelength

Parameterization and Parallelizing
Tuning Knobs: Clustering ratio, # top-level clusters, cluster area constraints Initial wirelength weight, wirelength weight reduction ratio Max # CG iterations for each wirelength weight Target placement discrepancy Detailed placement parameters, etc. Resources: SDSC ROCKS Cluster: 8 Xeon CPUs at 2.8GHz Michigan Prof. Sylvester’s Group: 8 various CPUs UCSD FWGrid: 60 Opteron CPUs at 1.6GHz UCSD VLSICAD Group: 8 Xeon CPUs at 2.4GHz Wirelength Improvement after Tuning : 2-3%

Artificial Benchmark Synthesis
Synthetic benchmarks to test code scalability and performance Rapid response to broadcast of s00-nam.pdf Created “synthetic versions of bigblue3 and bigblue4 within 48 hours Mimicked fixed-block layout diagrams in the artificial benchmark creation This process was useful: we identified (and solved) a problem with clustering in presence of many small fixed blocks

Results Circuit GP HPWL Leg HPWL DP HPWL CPU (h) adaptec1 adaptec2
80.20 81.80 79.50 3 adaptec2 84.70 92.18 87.31 adaptec3 218.00 230.00 10 adaptec4 182.90 194.75 187.71 13 bigblue1 93.67 97.85 94.64 5 bigblue2 140.68 147.85 143.80 12 bigblue3 357.28 407.09 357.89 22 bigblue4 813.91 868.07 833.21 50

Bigblue4 Placement HPWL =

Conclusions ISPD05 = an exercise in process and philosophy
At end, we were still 4% short of where we wanted Not happy with how we handled 5-day time frame Auto-tuning  first results ~ best results During competition, wrote but then left out “annealing” DP improvements that gained another 0.5% Students and IBM ARL did a really, really great job Currently restoring capabilities (congestion, timing-driven, etc.) and cleaning (antecedents in Naylor patent)

APLACE: A General and Extensible Large-Scale Placer

Similar presentations

Presentation on theme: "APLACE: A General and Extensible Large-Scale Placer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

APLACE: A General and Extensible Large-Scale Placer

Similar presentations

Presentation on theme: "APLACE: A General and Extensible Large-Scale Placer"— Presentation transcript:

Similar presentations

About project

Feedback