Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.

Slides:



Advertisements
Similar presentations
MIP-based Detailed Placer for Mixed-size Circuits Shuai Li, Cheng-Kok Koh ECE, Purdue University {li263,
Advertisements

Capo: Robust and Scalable Open-Source Min-cut Floorplacer Jarrod A. Roy, David A. Papa,Saurabh N. Adya, Hayward H. Chan, James F. Lu, Aaron N. Ng, Igor.
Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los.
Optimization of Placement Solutions for Routability Wen-Hao Liu, Cheng-Kok Koh, and Yih-Lang Li DAC’13.
Wen-Hao Liu1, Yih-Lang Li, and Cheng-Kok Koh Department of Computer Science, National Chiao-Tung University School of Electrical and Computer Engineering,
Natarajan Viswanathan Min Pan Chris Chu Iowa State University International Symposium on Physical Design April 6, 2005 FastPlace: An Analytical Placer.
X-Architecture Placement Based on Effective Wire Models Tung-Chieh Chen, Yi-Lin Chuang, and Yao-Wen Chang Graduate Institute of Electronics Engineering.
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Shuai Li and Cheng-Kok Koh School of Electrical and Computer Engineering, Purdue University West Lafayette, IN, Mixed Integer Programming Models.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1ICCAD 2010, Myung-Chul Kim,
Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement Saurabh Adya Igor Markov (University of Michigan)
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Placer Suboptimality Evaluation Using Zero-Change Transformations Andrew B. Kahng Sherief Reda VLSI CAD lab UCSD ECE and CSE Departments.
Routability-Driven Blockage-Aware Macro Placement Yi-Fang Chen, Chau-Chin Huang, Chien-Hsiung Chiou, Yao-Wen Chang, Chang-Jen Wang.
ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.
Boosting: Min-Cut Placement with Improved Signal Delay Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Constructive Benchmarking for Placement David A. Papa EECS Department University of Michigan Ann Arbor, MI Igor L. Markov EECS.
Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La.
On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093
1 A Tale of Two Nets: Studies in Wirelength Progression in Physical Design Andrew B. Kahng Sherief Reda CSE Department University of CA, San Diego.
Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.
DUSD(Labs) GSRC bX update March 2003 Aaron Ng, Marius Eriksen and Igor Markov University of Michigan.
Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.
Circuit Simulation Based Obstacle-Aware Steiner Routing Yiyu Shi, Paul Mesa, Hao Yu and Lei He EE Department, UCLA Partially supported by NSF Career Award.
ISPD 2000, San DiegoApr 10, Requirements for Models of Achievable Routing Andrew B. Kahng, UCLA Stefanus Mantik, UCLA Dirk Stroobandt, Ghent.
A Resource-level Parallel Approach for Global-routing-based Routing Congestion Estimation and a Method to Quantify Estimation Accuracy Wen-Hao Liu, Zhen-Yu.
POLAR 2.0: An Effective Routability-Driven Placer Chris Chu Tao Lin.
VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 18. Global Routing (II)
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.
Global Routing.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Archer: A History-Driven Global Routing Algorithm Mustafa Ozdal Intel Corporation Martin D. F. Wong Univ. of Illinois at Urbana-Champaign Mustafa Ozdal.
Improved Cut Sequences for Partitioning Based Placement Mehmet Can YILDIZ and Patrick H. Madden State University of New York at BinghamtonComputer Science.
Multilevel Generalized Force-directed Method for Circuit Placement Tony Chan 1, Jason Cong 2, Kenton Sze 1 1 UCLA Mathematics Department 2 UCLA Computer.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
1/24/20071 ECO-system: Embracing the Change in Placement Jarrod A. Roy and Igor L. Markov University of Michigan at Ann Arbor.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.
GLARE: Global and Local Wiring Aware Routability Evaluation Yaoguang Wei1, Cliff Sze, Natarajan Viswanathan, Zhuo Li, Charles J. Alpert, Lakshmi Reddy,
Recursive Bisection Placement*: feng shui 5.0 Ameya R. Agnihotri Satoshi Ono Patrick H. Madden SUNY Binghamton CSD, FAIS, University of Kitakyushu (with.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,
Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,
Quadratic VLSI Placement Manolis Pantelias. General Various types of VLSI placement  Simulated-Annealing  Quadratic or Force-Directed  Min-Cut  Nonlinear.
Physical Synthesis Comes of Age Chuck Alpert, IBM Corp. Chris Chu, Iowa State University Paul Villarrubia, IBM Corp.
Analytical Minimization of Signal Delay in VLSI Placement Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan
Chris Chu Iowa State University Yiu-Chung Wong Rio Design Automation
International Workshop on System-Level Interconnection Prediction, Sonoma County, CA March 2001ER UCLA UCLA 1 Wirelength Estimation based on Rent Exponents.
1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.
Routing Topology Algorithms Mustafa Ozdal 1. Introduction How to connect nets with multiple terminals? Net topologies needed before point-to-point routing.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.
Design Automation Conference (DAC), June 6 th, Taming the Complexity of Coordinated Place and Route Jin Hu †, Myung-Chul Kim †† and Igor L. Markov.
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
Effective Linear Programming-Based Placement Techniques Sherief Reda UC San Diego Amit Chowdhary Intel Corporation.
Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department
6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)
HeAP: Heterogeneous Analytical Placement for FPGAs
Revisiting and Bounding the Benefit From 3D Integration
APLACE: A General and Extensible Large-Scale Placer
2 University of California, Los Angeles
FLUTE: Fast Lookup Table Based RSMT Algorithm for VLSI Design
Presentation transcript:

Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann Arbor April 10, 2006

Motivation Place-and-route  Single step for designers ?  Implemented as separate point tools  Very little interaction/communication  Use different optimization objectives Our goal: reduce the gap between placement and routing  HPWL is the wrong objective  Must optimize something else ! Empirical results: consistent improvement over all published P&R results  Routability, routed wirelength, via counts

HPWL vs. Steiner Tree WL HPWL ≤ Steiner Tree WL ( = for 2- and 3-pin nets) Computing HPWL takes linear time, but Steiner trees are NP-hard Steiner Tree tools we evaluate:  Batched Iterated 1-Steiner (BI1ST) [Kahng,Robins 1992] Slow (n 3 ) Very accurate, even for 20+ pins  FastSteiner [Kahng,Mandoiu,Zelikovsky 2003] Faster but less accurate than BI1ST  FLUTE [Chu 2004, 2005] Very fast Optimal lookup tables for ≤9 pins, less accurate for 10+ pins Half-perimeter wirelength Steiner (tree) wirelength

Existing Placement Framework Consider placement bins Partition them  Use min-cut bisection  Place end-cases optimally Propagate terminals before partitioning  Terminals: fixed cells or cells outside current bin  Assigned to one of partitions Save runtime: a 20-pin may “propagate” into 3-pin net  “Inessential nets”: fixed terminals in both partitions (can be entirely ignored) Traditional min-cut placement tracks HPWL Placement bins End-case placement pins of one net propagated

Introduced in Theto placer [Selvakkumaran 2004] Refined in [Chen 2005]  Shown to accurately track HPWL Use 1 or 2 hyper-edges to represent each net for partitioning  Weights given by costs w left, w right, w cut  w left : HPWL when all cells on left side (a)  w right : HPWL when all cells on the right (b)  w cut : HPWL when cells on both sides (c) Better Modeling of HPWL by Net Weights In Min-cut Figure from [Chen,Chang,Lin 2005]

Key Observation For bisection, cost of each net is characterized by 3 cases  Cost of net when cut w cut  Cost of net when entirely in left partition: w left  Cost of net when entirely in right partition: w right In our work, we compute these costs for a different placement objective  Real difficulty in data structures!

Our Contributions Optimization of Steiner WL  In global placement (runtime penalty ~30%)  In detail placement Whitespace allocation to tame congestion Empirical evaluation of ROOSTER  No violations on 16 IBMv2 benchmarks (easy + hard)  Consistent improvements of published results  4-10% by routed wirelength  10-15% by via counts

Optimizing Steiner WL During Global Placement Recall – each net can be modeled by 3 numbers  This has only been applied to HPWL optimization We calculate w left, w right, w cut using Steiner evaluator  For each net, before partitioning starts  The bottleneck is still in partitioning  can afford a fast Steiner-tree evaluator Pitfall : cannot propagate terminals !  Nets that were inessential are now essential  Must consider all pins of each net  More accurate modeling, but potentially much slower

Pointsets with multiplicities: two per net Unique locations of fixed & movable pins  At top placement layers, very few unique pin positions (except for fixed I/O pins) Maintain the number of pins at each location  Fast maintenance when pins get reassigned to partitions (or move) Allows to efficiently compute the 3 costs New Data Structure for Global Placement

Results depend on the Steiner tree evaluator  We choose FastSteiner (vs BI1ST and FLUTE)  See Appendix B for detailed comparison Impact of changes to global placement  Results consistent across IBMv2 benchmarks  Steiner WL reduction: 2.9%  HPWL grows by 1.3%  Runtime grows by 27% Improvement in Global Placement

Optimizing Steiner WL in Detail Placement We leverage the speed of FLUTE with two sliding-window optimizers  Exhaustive enumeration for 4-5 cells in a single row  Interleaving by dynamic programming (5-8 cells) Fast but not always optimal Using both reduces Steiner WL by 0.69%, routed WL by 0.72% and consumes 11.83% of [global + detail] placement runtime Much faster than single-trunk tree optimization from [Jariwala,Lillis 2004]  Our optimization seems stronger, not restricted to FPGAs

Congestion-based Cutline Shifting To reduce congestion ROOSTER allocates whitespace non-uniformly Based on the WSA technique [Li 2004]  WSA is applied after detail placement (our technique is used during global placement)  Identifies congested regions  Injects whitespace, causing cell overlap  Legalization and re-placement is required  Detail placement recovers HPWL

Congestion-based Cutline Shifting Our technique is applied pro-actively during mincut  No need for “re-placement” and legalization  This improves via counts Periodically, build up-to-date congestion maps  Use congestion maps from [Westra 2004]  Estimate congestion for each existing placement bin Cutlines shifted to equalize congestion in bins 15% WS Cong: 100 Cong: % WS 20% WS Cong: 150

Empirical Results: IBMv2 ROOSTER /16 mPL-R+WSA /16 APlace /8 Capo Not published0/16 Dragon Not published1/16 FengShui Not published7/16 mPL-R+WSA /16 APlace /16 FengShui /16 ROOSTER: Rigorous Optimization Of Steiner Trees Eases Routing Routed WL RatioVia Ratio Routes with Violation Published results: Most recent results:

ROOSTER with several detail placers: IBMv2 ROOSTER /16 ROOSTER+WSA /16 ROOSTER+ Dragon 4.0 DP /16 ROOSTER+ FengShui 5.1 DP /16 Routed WL Ratio Via Ratio Routes with Violation

Improvement Breakdown: IBMv2 easy V = Violations

Improvement Breakdown: IBMv2 hard

Congestion with and without Capo -uniformWS 5 hours to route; 120 violations ROOSTER 22 mins to route; 0 violations

Conclusions Steiner WL should be optimized in global and detail placement  Improves routability and routed WL  10-15% improvement in via counts  Better Steiner evaluators may further reduce routed WL Congestion-driven cutline shifting in global placement is competitive with WSA  Better via counts  May be improved if better congestion maps available ROOSTER freely available for all uses

Questions?

Huang & Kahng, ISPD1997 Quadrisection can bias min-cut objective to Minimum Spanning Tree [Huang,Kahng 1997]  Loses accuracy by gridding terminals  2x2 MST equivalent to 2x2 Steiner  Need much larger grid to truly optimize Steiner WL Compared to our work, Huang & Kahng…  Did not handle Steiner trees, only MSTs (handling Steiner trees may require 4x4 geometric partitioner)  Did not handle terminals very accurately (which seems to be the key)  Never evaluated the results with a router (!)