Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La.

Slides:

Advertisements

Similar presentations

Capo: Robust and Scalable Open-Source Min-cut Floorplacer Jarrod A. Roy, David A. Papa,Saurabh N. Adya, Hayward H. Chan, James F. Lu, Aaron N. Ng, Igor.

Advertisements

OCV-Aware Top-Level Clock Tree Optimization

Optimization of Placement Solutions for Routability Wen-Hao Liu, Cheng-Kok Koh, and Yih-Lang Li DAC’13.

Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages

X-Architecture Placement Based on Effective Wire Models Tung-Chieh Chen, Yi-Lin Chuang, and Yao-Wen Chang Graduate Institute of Electronics Engineering.

A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.

F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Provably Good Global Buffering by.

Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.

Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.

FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.

A Clustering Utility Based Approach for S. Areibi, M. Thompson, A. Vannelli uoguelph.ca September 2001 School of Engineering ASIC Design 14th.

Placer Suboptimality Evaluation Using Zero-Change Transformations Andrew B. Kahng Sherief Reda VLSI CAD lab UCSD ECE and CSE Departments.

Intrinsic Shortest Path Length: A New, Accurate A Priori Wirelength Estimator Andrew B. KahngSherief Reda VLSI CAD Laboratory.

Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.

APLACE: A General and Extensible Large-Scale Placer Andrew B. KahngSherief Reda Qinke Wang VLSICAD lab University of CA, San Diego.

Boosting: Min-Cut Placement with Improved Signal Delay Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA

Constructive Benchmarking for Placement David A. Papa EECS Department University of Michigan Ann Arbor, MI Igor L. Markov EECS.

Faster SAT and Smaller BDDs via Common Function Structure Fadi A. Aloul, Igor L. Markov, Karem A. Sakallah University of Michigan.

Power-Aware Placement

Chapter 2 – Netlist and System Partitioning

Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.

On Modeling and Sensitivity of Via Count in SOC Physical Implementation Kwangok Jeong Andrew B. Kahng.

On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093

1 A Tale of Two Nets: Studies in Wirelength Progression in Physical Design Andrew B. Kahng Sherief Reda CSE Department University of CA, San Diego.

1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.

Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.

DUSD(Labs) GSRC bX update March 2003 Aaron Ng, Marius Eriksen and Igor Markov University of Michigan.

Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.

Detailed Placement for Leakage Reduction Using Systematic Through-Pitch Variation Andrew B. Kahng †‡ Swamy Muddu ‡ Puneet Sharma ‡ CSE † and ECE ‡ Departments,

EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.

POLAR 2.0: An Effective Routability-Driven Placer Chris Chu Tao Lin.

CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.

International Symposium of Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Experimental Setup Cadence QPlace Cadence WRoute LEF/DEFLEF/DEF Dragon.

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.

Chih-Hung Lin, Kai-Cheng Wei VLSI CAD 2008

MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD

CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.

Global Routing.

Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.

TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.

Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.

March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.

Archer: A History-Driven Global Routing Algorithm Mustafa Ozdal Intel Corporation Martin D. F. Wong Univ. of Illinois at Urbana-Champaign Mustafa Ozdal.

UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.

Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.

Quadratic and Linear WL Placement Using Quadratic Programming: Gordian & Gordian-L Shantanu Dutt ECE Dept., Univ. of Illinois at Chicago Acknowledgements:

1/24/20071 ECO-system: Embracing the Change in Placement Jarrod A. Roy and Igor L. Markov University of Michigan at Ann Arbor.

Kwangsoo Han‡, Andrew B. Kahng‡† and Hyein Lee‡

10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.

Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.

Deferred Decision Making Enabled Fixed- Outline Floorplanner Jackey Z. Yan and Chris Chu DAC 2008.

Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,

Analytical Minimization of Signal Delay in VLSI Placement Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan

Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.

1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.

Outline Motivation and Contributions Related Works ILP Formulation

International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.

CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.

-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,

Fuzzy Logic Placement Emily Blem ECE556 Final Project December 14, 2004 Reference: E. Kang, R.B. Lin, and E. Shragowitz. “Fuzzy Logic Approach to VLSI.

Effective Linear Programming-Based Placement Techniques Sherief Reda UC San Diego Amit Chowdhary Intel Corporation.

Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department

Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly

HeAP: Heterogeneous Analytical Placement for FPGAs

Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.

APLACE: A General and Extensible Large-Scale Placer

A Semi-Persistent Clustering Technique for VLSI Circuit Placement

Fast Min-Register Retiming Through Binary Max-Flow

Presentation transcript:

Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA CSE Department University of CA, San Diego La Jolla, CA VLSI CAD Laboratory at UCSD

Outline  Min-cut Placement and Terminal Propagation  Ambiguous Terminal Propagation  Placement Feedback  Iterated Controlled Feedback  Accelerated Feedback  Experimental Results  Conclusions

Min-Cut Placement: Objective  Steiner tree represents the minimum wirelength need to connect a number of cells  Total wirelength is the sum of the length of Steiner trees  Routed wirelength is the typically larger than total wirelength due to detours arising from contention on routing resources  Half-Perimeter Wirelength (HPWL) correlates well with the routed wirelength, represents a lower bound on the net length and fast to calculate  Min-cut Placement Objective: Total wirelength minimization

Min-Cut Placement: Method Input Level 1  Min-Cut Placement Method: Sequential min-cut partitioning Level 2 block  Key Issues:  How to partition a hypergraph? Multilevel hypergraph partitioning using the Fiduccia/Mattheyses heuristic  How to propagate net connectivity information from one block to another? Netlist (hypergraph) block

Terminal Propagation AB CD Simple hypergraph A B C D 1 2 After first placement level 1 2 A B C D Case II  Case II: Information about cells in one block are accounted for in the other block → local partitioning results are translated to global wirelength results 1  Well-studied problem:  Terminal propagation (Dunlop/Kernighan85)  Global objectives/cycling (HuangK97, Zheng/Dutt00, Yildiz/Madden01) 2 A B C D Case I  Case I: Blocks are partitioned in isolation → optimal local partitioning results but far from optimal global results 1

Terminal Propagation Mechanism B1B1 B2B2 u v ufuf  B 1 has been partitioned; B 2 is to be partitioned  u is propagated as a fixed vertex u f to the subblock that is closer  u f biases the partitioner to move v upward

X ?  Ambiguous propagation occurs when terminals, e.g. Y 4, are equally close to the two subblocks of a block under partitioning  Traditional solution: either propagate to both subblocks or not to propagate at all Ambiguous Terminal Propagation Y1Y1 Y2Y2 partition fuzziness Y4Y4 Y3Y3 f1f1 f2f2 f3f3

Effect of Ambiguous Terminal Propagations L R Given an edge e with a set of cells I: ● cells are closer to L than R Conclusion: Ambiguous propagations lead to indeterminism in propagation decisions → wirelength increase ● cells are closer to R than L ● cells are equally proximate to both L and R 1. Only ● → L 2. Only ● → R 3. ● and ● → neither Terminal Propagation decisions (without ambiguous) 1. ● and ● → L or neither 2. ● and ● → R or neither 3. ● ● and ● → neither 4. ● → neither or L or R Terminal Propagation decisions (with ambiguous)

Min-Cut Placement Flow Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation  The input to the flow is the I/O pad locations, and the circuit netlist where all are collapsed at the center of the chip  The output of the flow is a global placement, where groups of cells are assigned portions of the chip’s rows  A detailed placer determines the exact locations of all cells

Outline Min-cut Placement and Terminal Propagation Ambiguous Terminal Propagation  Placement Feedback  Iterated Controlled Feedback  Accelerated Feedback  Experimental Results  Conclusions

Mitigating Ambiguous Terminal Propagation  Two hyperedges: {A, B, C}, {X, A, B}. B 1 is partitioned before B 2 B A C X C is ambiguously propagated B A C X 211 B A C X Further partitioning Cuts = 3, Wirelength = 6 1 Undo B A C X C Repartition X A C B C C is propagated to the top Further partitioning X A C B Cuts = 2, Wirelength = 5 B1B1 B2B2

Placement Feedback Traditional Placement Flow Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation Placement Flow with Feedback For each placement level: - Undo all partitioning/block bisecting results, but retain the new cell locations for terminal propagations - Use the new cell locations to re-do the level’s placement

Placement Feedback Assessment Metrics:  Reduction in ambiguous terminal propagations  Associated reduction in HPWL Experimental Setup  We implement feedback in Capo (version 8.7)  For each placement level: - Measure the number of ambiguous terminal propagations before and after feedback - Measure the HPWL estimate before and after feedback (assuming all previous placements levels had feedback)

Feedback Effects Percentage reduction in ambiguous propagations Reductions in ambiguous terminals and HPWL per level are strongly correlated Placement Level Percentage reduction in HPWL Placement Level

Outline Min-cut Placement and Terminal Propagation Ambiguous Terminal Propagation Placement Feedback  Iterated Controlled Feedback  Accelerated Feedback  Experimental Results  Conclusions

 Since the feedback loop produces new outputs → iterate over the feedback loop a number of times  If the feedback response is not desirable → insert a feedback controller to enhance the response. Iterative Placement Feedback Feedback controller should:  Evaluate and optimize some placement quality or objective  Decide when to terminate feedback iterating Feedback Controller Placement Flow with Feedback Controllers Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation

Feedback Controller Objectives c1c1 c2c2 d1d1 d2d2  Cut partitioning objective: Q P = c 1 + c 2  HPWL objective: Q H = c 1 × d 1 + c 2 × d 2  Q P and Q H are not correlated! Example: Assume d 1 = 6 and d 2 = 8  c 1 = c 2 = 100 → Q P = 200 and Q H = 1400  c 1 = 85, c 2 = 112 → Q P = 197 and Q H = 1406  Two possible objectives (placement qualities) to optimize: B1B1 B2B2

Feedback Controller Stopping Criteria Feedback Controller Placement Flow with Feedback Controllers Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation A.Monotonic Improvement Criterion: Iterate per placement level until there is no further improvement in Q P (or Q H ) Q P or Q H Iteration

Feedback Controller Stopping Criteria Feedback Controller Placement Flow with Feedback Controllers Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation B. Best Improvement Criterion: Iterate per placement level a fixed number of times but pass the best results seen Q P (or Q H ) Q P or Q H Iteration

Feedback Controller Stopping Criteria Feedback Controller Placement Flow with Feedback Controllers Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation C. Unconstrained Criterion: Iterate per placement level a fixed number of times and pass the last results Q P or Q H Iteration

Controller Type Comparison Feedback Controller Placement Flow with Feedback Controllers Level 1 Partitioning Terminal Propagation Level 2 Partitioning Terminal Propagation Level m Partitioning Terminal Propagation 3 Stopping Criteria2 Objectives Monotonic ImprovementTotal Cut (Q P ) HPWL Estimate (Q H ) Best ImprovementTotal Cut (Q P ) HPWL Estimate (Q H ) Unconstrained-  Combinations of the 3 stopping criteria and 2 objectives yield 5 controllers  We study the aggregate impact of the different controllers on the final HPWL

 Q P (based on partitioning) controllers dominate Q H (based on HPWL) controllers  Best Improvement controllers outperform monotonic improvement controllers  Best Improvement Q P controller slightly outperforms the unconstrained controller Effect of Controller on Final Wirelength Monotonic Q H Best Q H Monotonic Q P Best Q P Unconstrained Final HPWL versus number of iterations for different controllers Iteration

 Results are average of 6 seeds for up to 12 iterations using the best improvement Q P controller  Final value slightly oscillates around a fixed value with a 8-9% improvement in HPWL in comparison to traditional placement flow Asymptotic Controller Behavior Final HPWL versus number of iterations for different controllers Best Q P Iteration

 Typically, placers call the multilevel partitioner a number of times and utilize the best cluster-tree partitioning results  In iterated feedback, only the last feedback iteration determines the partitioning results; other loops determine accurate terminal propagation. Accelerated Feedback V Cycle  Feedback runtime α number of feedback iterations CoarseningUncoarsening To speedup our feedback implementation: → Call the multi-level partitioner once (1 V-Cycle) for each feedback loop → Restore to default placer settings (2 V-Cycles) for the last feedback iteration

Outline Min-cut Placement and Terminal Propagation Ambiguous Terminal Propagation Placement Feedback Iterated Controlled Feedback Accelerated Feedback  Experimental Results  Conclusions

 We test our methodology in Capo version 8.7  Placement results are average of 6 seeds Experimental Setup  Cadence’s WarpRoute is used for routed wirelength evaluation  All experiments conducted on 2.4 GHz Xeon Linux workstation, 2 GB RAM  Code implementation took 130 lines of C++ code  We evaluate feedback on the IBM version 1, version 2, and PEKO benchmarks

 We use 3 feedback iterations with the best improvement Q p feedback controller Percentage improvement in HPWL (Half-Perimeter Wirelength) in comparison to Capo AFB FB HPWL Results (IBM Version 1) %

 Feedback: Max improvement 13.73% and average improvement 5.43% with 4.10x the original in Capo runtime  Accelerated Feedback: Max improvement 13.43% and average improvement 4.70% with 2.43x the original Capo runtime  PEKO benchmarks: Max improvement 10% and average improvement 5% for feedback at the expense of 2-3x increase in Capo runtime HPWL Results (IBM Version 1)

Routed Wirelength Results (IBM Version 2 - Hard) % Percentage improvement in routed wirelength in comparison to Capo bench mark Violations CapoFeedBack Ibm Ibm0200 Ibm Ibm08590 Number of routing violations

Percentage improvement in routed wirelength in comparison to Capo. % bench mark Violations CapoFeedBack Ibm Ibm0200 Ibm0700 Ibm0800 Number of routing violations Routed Wirelength Results (IBM Version 2 - Easy)

Conclusions New understanding of how ambiguous terminal propagation leads to indeterminism in propagation results and degraded placer performance Idea: reduce indeterminism by undoing placement results, but still using them to guide future partitioning. Flavors of this approach proposed before, but for different contexts Our approach is captured as feedback, which we tune using controllers Detailed study of variant objectives that can be optimized by the controllers, as well as iterating criteria Accelerated feedback: efficient implementations to reduce runtime impact IBMv1 HPWL results: up to 14% (best) and 6% (avg) improvement over Capo IBMv2 routed WL results: up to 10% improvement over Capo, with improved routability and reduced via count Accelerated feedback is now the default mode in Capo

Acknowledgments We thank Igor Markov (University of Michigan) for helpful discussions.

Thank You

Block Ordering Results are inconclusive!  Regular ordering  Random ordering  Alternate ordering 1234