Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden.

Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden

Outline Motivation Motivation Why does the industry need benchmarking? Why does the industry need benchmarking? Available benchmarks and placement tools Available benchmarks and placement tools Performance results Performance results Unresolved issues Unresolved issues Benchmarking for routability Benchmarking for routability Benchmarking for timing-driven placement Benchmarking for timing-driven placement Public placement utilities Public placement utilities Lessons learned + beyond placement Lessons learned + beyond placement

A True Story About Benchmarking An undergraduate student implements an optimal B&B block packer, An undergraduate student implements an optimal B&B block packer, finds min areas possible for apte & xerox, finds min areas possible for apte & xerox, compares to published results, compares to published results, finds an ISPD 2001 paper that reports: finds an ISPD 2001 paper that reports: Floorplan areas smaller than optimal Floorplan areas smaller than optimal In two cases, areas smaller than  block areas In two cases, areas smaller than  block areas More true stories in our ISPD 2003 paper More true stories in our ISPD 2003 paper

Industrial Benchmarking Growing size & complexity of VLSI chips Growing size & complexity of VLSI chips Design objectives Design objectives Wirelength / congestion / timing / power / yield Wirelength / congestion / timing / power / yield Design constraints Design constraints Fixed die / routability / FP constraints / fixed IPs / cell orientations / pin access / signal integrity / … Fixed die / routability / FP constraints / fixed IPs / cell orientations / pin access / signal integrity / … Can the same algo excel in all contexts? Can the same algo excel in all contexts? Layout sophistication motivates open benchmarking for placement Layout sophistication motivates open benchmarking for placement

Whitespace Handling Modern ASICs are laid out in fixed-die context Modern ASICs are laid out in fixed-die context Layout area, routing tracks, power lines, etc are fixed before placement Layout area, routing tracks, power lines, etc are fixed before placement Area minimization is irrelevant (area is fixed) Area minimization is irrelevant (area is fixed) New phenomenon: whitespace New phenomenon: whitespace Row utilization % = density % = 100% - whitespace % Row utilization % = density % = 100% - whitespace % How does one distribute whitespace ? How does one distribute whitespace ? Pack all cells to the left [Feng Shui, mPL] Pack all cells to the left [Feng Shui, mPL] All whitespace is on the right All whitespace is on the right Typical for variable-die placers Typical for variable-die placers Distribute uniformly [Capo, Kraftwerk] Distribute uniformly [Capo, Kraftwerk] Allocate whitespace to congested regions [Dragon] Allocate whitespace to congested regions [Dragon]

Design Types ASICs ASICs Lots of fixed I/Os, few macros, millions of standard cells Lots of fixed I/Os, few macros, millions of standard cells Placement densities : 40-80% (IBM) Placement densities : 40-80% (IBM) Flat and hierarchical designs Flat and hierarchical designs SoCs SoCs Many more macro blocks, cores Many more macro blocks, cores Datapaths + control logic Datapaths + control logic Can have very low placement densities : < 20% Can have very low placement densities : < 20% Micro-Processor (  P) Random Logic Macros(RLM) Micro-Processor (  P) Random Logic Macros(RLM) Hierarchical partitions are placement instances (5-30K) Hierarchical partitions are placement instances (5-30K) High placement densities : 80%-98% (low whitespace) High placement densities : 80%-98% (low whitespace) Many fixed I/Os, relatively few standard cells Many fixed I/Os, relatively few standard cells Recall “Partitioning w Terminals” DAC`99, ISPD `99, ASPDAC`00

ASICs Many fixed ports: perimeter- and area-array A handfull (1-20) of large, fixed macros 100's to 1000's of fixed smaller cells Some designs are hierarchical E.g., have floorplan constraints Functional / logic hierarchy  physical hierarchy Many designs are flat Up to 2M placeable objects (for flat designs)

Cores Computational and DSP cores are commonly included in SoCs Mix of standard-cell and semi-custom style Datapaths, structured components Some control logic

 P RLMs Manual floorplanning, hierarchies Standard-cell Place & Route instances are small (5K to 30K placeable objects) Std. cells sometimes occupy a single row Almost no whitespace Large ratio of fixed ports to movable cells (relative to ASIC parts) Most cells are movable, but not always Recall “Partitioning w Terminals” DAC`99, ASPDAC`00

IBM PowerPC 601 chip

Intel Centrino chip

Requirements for Placers (1) Must handle 4-10M cells, 1000s macros 64 bits + near-linear asymptotic complexity Scalable/compact design database (OpenAccess) Accept fixed ports/pads/pins + fixed cells Place macros, esp. with var. aspect ratios Non-trivial heights and widths (e.g., height=2rows) Honor targets and limits for net length Respect floorplan constraints Handle a wide range of placement densities (from <25% to 100% occupied), ICCAD `02

Requirements for Placers (2) Add / delete filler cells and Nwell contacts Ignore clock connections ECO placement Fix overlaps after logic restructuring Place a small number of unplaced blocks Datapath planning services E.g., for cores Provide placement dialog services to enable cooperation across tools E.g., between placement and synthesis

Why Worry About Benchmarking? Variety of conflicting objectives Variety of conflicting objectives Multitude of layout features / constraints Multitude of layout features / constraints No single algorithm finds best placements for all design problems (yet?) No single algorithm finds best placements for all design problems (yet?) Need independent evaluation Need independent evaluation Need a set of common placement BM’s with features of interest (e.g., IBM-Floorplacement) Need a set of common placement BM’s with features of interest (e.g., IBM-Floorplacement) Need to know / understand how algorithms behave over the entire design space Need to know / understand how algorithms behave over the entire design space

Available Placement BM’s MCNC MCNC Small and outdated (routing channels between rows, etc) Small and outdated (routing channels between rows, etc) IBM-Place / IBM-Dragon (ste 1 & 2) - UCLA (ICCAD `00) IBM-Place / IBM-Dragon (ste 1 & 2) - UCLA (ICCAD `00) Derived from ISPD98-IBM partitioning suite. Macros removed. Derived from ISPD98-IBM partitioning suite. Macros removed. IBM Floor-placement – Michigan (ISPD ‘02) IBM Floor-placement – Michigan (ISPD ‘02) Derived from same IBM circuits. Nothing removed. Derived from same IBM circuits. Nothing removed. PEKO – UCLA (DAC ‘95, ASPDAC ‘03, ISPD ‘03) PEKO – UCLA (DAC ‘95, ASPDAC ‘03, ISPD ‘03) Artificial netlists with known optimal wirelength; up to 2M cells Artificial netlists with known optimal wirelength; up to 2M cells No global wires No global wires Standardized grids – Michigan Standardized grids – Michigan Created to model data-paths during placement Created to model data-paths during placement Easy to visualize, optimal placements are obvious Easy to visualize, optimal placements are obvious Vertical benchmarks - CMU Vertical benchmarks - CMU Multiple representations (PicoJava, Piperench, CMUDSP) Multiple representations (PicoJava, Piperench, CMUDSP) Have some timing info, but not enough to evaluate timing Have some timing info, but not enough to evaluate timing

Academic Placers We Used Kraftwerk Nov 2002 (no major changes since DAC98) Kraftwerk Nov 2002 (no major changes since DAC98) Eisenmann and Johannes (TU Munch) Eisenmann and Johannes (TU Munch) Force-directed (analytical) placer Force-directed (analytical) placer Capo 8.5 / 8.6 (Apr / Nov 2002) Capo 8.5 / 8.6 (Apr / Nov 2002) Adya, Caldwell, Kahng and Markov (UCLA and Michigan) Adya, Caldwell, Kahng and Markov (UCLA and Michigan) Recursive min-cut bisection (built-in partitioner MLPart) Recursive min-cut bisection (built-in partitioner MLPart) Dragon 2.20 / 2.23 (Sept / Feb 2003) Dragon 2.20 / 2.23 (Sept / Feb 2003) Choi, Sarrafzadeh, Yang and Wang (Northwestern and UCLA) Choi, Sarrafzadeh, Yang and Wang (Northwestern and UCLA) Min-cut multi-way partitioning (hMetis) & simulated annealing Min-cut multi-way partitioning (hMetis) & simulated annealing FengShui 1.2 / 1.6 / 2.0 (Fall 2000 / Feb 2003) FengShui 1.2 / 1.6 / 2.0 (Fall 2000 / Feb 2003) Madden and Yildiz (SUNY Binghamton) Madden and Yildiz (SUNY Binghamton) Recursive min-cut multi-way partitioning (hMetis + built-in) Recursive min-cut multi-way partitioning (hMetis + built-in) mPL 1.2 / 1.2b (Nov 2002 / Feb 2003) mPL 1.2 / 1.2b (Nov 2002 / Feb 2003) Chan, Cong, Shinnerl and Sze (UCLA) Chan, Cong, Shinnerl and Sze (UCLA) Multi-level enumeration-based placer Multi-level enumeration-based placer

Features Supported by Placers

Performance on Available BM’s Our objectives and goals Our objectives and goals Perform first-ever comprehensive evaluation Perform first-ever comprehensive evaluation Seek trends and anomalies Seek trends and anomalies Evaluate robustness of different placers Evaluate robustness of different placers One does not expect a clear winner One does not expect a clear winner Minor obstacles and potential pitfalls Minor obstacles and potential pitfalls Not all placers are open-source / public Not all placers are open-source / public Not all placers support the Bookshelf format Not all placers support the Bookshelf format Most do Most do Must be careful with converters (!) Must be careful with converters (!)

PEKO BMs (ASPDAC 03)

Cadence-Capo BMs (DAC 2000) I – failure to read input; a – abort I – failure to read input; a – abort oc – out-of-core cells; / - in variable-die mode oc – out-of-core cells; / - in variable-die mode Feng Shui – similar to Dragon, better on test1 Feng Shui – similar to Dragon, better on test1

Results : Grids Unique optimal solution

Relative Performance Feng Shui 1.6 / 2.0 improves upon FS 1.2 Feng Shui 1.6 / 2.0 improves upon FS 1.2 ?

Placers Do Well on Benchmarks Published By the Same Group Observe that Observe that Capo does well on Cadence-Capo Capo does well on Cadence-Capo Dragon does well on IBM-Place (IBM-Dragon) Dragon does well on IBM-Place (IBM-Dragon) Not in the table: FengShui does well on MCNC Not in the table: FengShui does well on MCNC mPL does well on PEKO mPL does well on PEKO This is hardly a coincidence This is hardly a coincidence Motivation for more / better benchmarks Motivation for more / better benchmarks

Benchmarking for Routability of Placements Placer tuning also explains routability results Placer tuning also explains routability results Dragon performs well on the IBM-Dragon suite Dragon performs well on the IBM-Dragon suite Capo performs well on the Cadence-Capo suite Capo performs well on the Cadence-Capo suite Routability on one set does not guarantee much Routability on one set does not guarantee much Need accurate / common routability metrics Need accurate / common routability metrics … and shared implementations (binaries, source code) … and shared implementations (binaries, source code) Related benchmarking issues Related benchmarking issues No good public benchmarks for routing ! No good public benchmarks for routing ! Routability may conflict with timing / power optimizations Routability may conflict with timing / power optimizations

Simple Congestion Metrics Horizontal vs. Vertical wirelength Horizontal vs. Vertical wirelength HPWL = WL H +WL V HPWL = WL H +WL V Two placements with same HPWL may have very different WL H and WL V Two placements with same HPWL may have very different WL H and WL V Think of preferred-direction routing & odd #layers Think of preferred-direction routing & odd #layers Probabilistic congestion maps Probabilistic congestion maps Bhatia et al – DAC 02 Bhatia et al – DAC 02 Lou et al - ISPD 00, TCAD 01 Lou et al - ISPD 00, TCAD 01 Carothers & Kusnadi – ISPD 99` Carothers & Kusnadi – ISPD 99`

Horizontal vs. Vertical WL

Probabilistic Congestion Maps

Metric: Run a Router Global or Global + detail? Global or Global + detail? Local effects (design rules, cell libraries) may affect results too much Local effects (design rules, cell libraries) may affect results too much “noise” in global placement (for 2M cells) ? “noise” in global placement (for 2M cells) ? Open-source or Industrial? Open-source or Industrial? Tunable? Easy to integrate? Tunable? Easy to integrate? Saves global routing information? Saves global routing information? Publicly available routers Publicly available routers Labyrinth from UCLA Labyrinth from UCLA Force-directed router from UCB Force-directed router from UCB

Placement Utilities http://vlsicad.eecs.umich.edu/BK/PlaceUtils/ http://vlsicad.eecs.umich.edu/BK/PlaceUtils/ Accept input in the GSRC Bookshelf format Accept input in the GSRC Bookshelf format Format converters Format converters LEF/DEF  Bookshelf LEF/DEF  Bookshelf Bookshelf  Kraftwerk Bookshelf  Kraftwerk BLIF(SIS)  Bookshelf BLIF(SIS)  Bookshelf Evaluators, checkers, postprocessors and plotters Evaluators, checkers, postprocessors and plotters Contributions in these categories are esp. welcome Contributions in these categories are esp. welcome

Placement Utilities (cont’d) Wirelength Calculator (HPWL) Wirelength Calculator (HPWL) Independent evaluation of placement results Independent evaluation of placement results Placement Plotter Placement Plotter Saves gnuplot scripts ( .eps,.gif, …) Saves gnuplot scripts ( .eps,.gif, …) Multiple views (cells only, cells+nets, rows,…) Multiple views (cells only, cells+nets, rows,…) Used earlier in this presentation Used earlier in this presentation Probabilistic Congestion Maps (Lou et al.) Probabilistic Congestion Maps (Lou et al.) Gnuplot scripts Gnuplot scripts Matlab scripts Matlab scripts better graphics, including 3-d fly-by views better graphics, including 3-d fly-by views.xpm files ( .gif,.jpg,.eps, …).xpm files ( .gif,.jpg,.eps, …)

Placement Utilities (cont’d) Legality checker Legality checker Simple legalizer Simple legalizer Layout Generator Layout Generator Given a netlist, creates a row structure Given a netlist, creates a row structure Tunable %whitespace, aspect ratio, etc Tunable %whitespace, aspect ratio, etc All available in binaries/PERL at All available in binaries/PERL at http://vlsicad.eecs.umich.edu/BK/PlaceUtils/ Most source codes are shipped w Capo Most source codes are shipped w Capo Your contributions are welcome Your contributions are welcome

Challenges for Evaluating Timing-Driven Optimizations QOR not defined clearly QOR not defined clearly Max path-length? Worst set-up slack? Max path-length? Worst set-up slack? With false paths or without?... With false paths or without?... Evaluation methods are not replicable (often shady) Evaluation methods are not replicable (often shady) Questionable delay models, technology params Questionable delay models, technology params Net topology generators (MST, single-trunk Steiner trees) Net topology generators (MST, single-trunk Steiner trees) Inconsistent results: path delays <  gate delays Inconsistent results: path delays <  gate delays Public benchmarks?... Public benchmarks?... Anecdote: TD-place benchmarks in Verilog (ISPD `01) Anecdote: TD-place benchmarks in Verilog (ISPD `01) Companies guard netlists, technology parameters Companies guard netlists, technology parameters Cell libraries; area constraints Cell libraries; area constraints

Metrics for Timing + Reporting STA non-trivial: use PrimeTime or PKS STA non-trivial: use PrimeTime or PKS Distinguish between optimization and evaluation Distinguish between optimization and evaluation Evaluate setup-slack using commercial tools Evaluate setup-slack using commercial tools Optimize individual nets and/or paths Optimize individual nets and/or paths E.g., net-length versus allocated budgets E.g., net-length versus allocated budgets Report all relevant data Report all relevant data How was the total wirelength affected? How was the total wirelength affected? Were per-net and per-path optimizations successful? Were per-net and per-path optimizations successful? Did that improve worst slack or did something else? Did that improve worst slack or did something else? Huge slack improvements reported in some 1990s papers, but wire delays were much smaller than gate delays Huge slack improvements reported in some 1990s papers, but wire delays were much smaller than gate delays

Local circuit tweaks improve worst slack Local circuit tweaks improve worst slack How do global placement changes affect slack, when followed by sizing, buffering…? How do global placement changes affect slack, when followed by sizing, buffering…? Impact of Physical Synthesis Slack (TNS) InitialSizedBuffered 89689-5.87 (-10223)-5.08 (-9955)D2-3.14 (-5497)99652-6.35 (-8086)-5.26 (-5287)D3-4.68 (-2370)687946-8.95 (-4049)- 8.80 (-3910)D5-6.40 (-3684)22253-2.75 (-508)-2.17 (-512)D1-0.72 (-21) # Inst 147955-7.06 (-7126)-5.16 (-1568)D4-4.14 (-1266)

Correlated Non-timing Metrics? If you cannot solve a hard problem, reduce it to a simpler problem If you cannot solve a hard problem, reduce it to a simpler problem Validate your reduction ! Validate your reduction ! E.g., show that slack correlates with ??... E.g., show that slack correlates with ??... Delay budgeting and net-length limits Delay budgeting and net-length limits Before placement, for the whole chip Before placement, for the whole chip Or in the context of incremental re-placement Or in the context of incremental re-placement Do some placement algorithms lead to smaller circuit delays ? (w/o timing info!) Do some placement algorithms lead to smaller circuit delays ? (w/o timing info!) Recall: quadratic net lengths versus linear Recall: quadratic net lengths versus linear

Benchmarking Needs for Timing Opt. A common, reusable STA methodology A common, reusable STA methodology PrimeTime or PKS PrimeTime or PKS High-quality, open-source infrastructure (funding?) High-quality, open-source infrastructure (funding?) Metrics validated against phys. synthesis Metrics validated against phys. synthesis The simpler the better, but must be good predictors The simpler the better, but must be good predictors Benchmarks with sufficient info Benchmarks with sufficient info Flat gate-level netlists Flat gate-level netlists Library information ( < 250nm ) Library information ( < 250nm ) Realistic timing & area constraints Realistic timing & area constraints

Beyond Placement (Lessons) Evaluation methods for BMs must be explicit Evaluation methods for BMs must be explicit Prevent user errors (no TD-place BMs in Verilog) Prevent user errors (no TD-place BMs in Verilog) Try to use open-source evaluators to verify results Try to use open-source evaluators to verify results Visualization is important (sanity checks) Visualization is important (sanity checks) Regression-testing after bugfixes is important Regression-testing after bugfixes is important Need more open-source tools Need more open-source tools Complete descriptions of algos lower barriers to entry Complete descriptions of algos lower barriers to entry Need benchmarks with more information Need benchmarks with more information Use artificial benchmarks with care Use artificial benchmarks with care Huge gaps in benchmarking for routers Huge gaps in benchmarking for routers

Beyond Placement (cont’d) Need common evaluators of delay / power Need common evaluators of delay / power To avoid inconsistent results To avoid inconsistent results Relevant initiatives from Si2 Relevant initiatives from Si2 OLA (Open Library Architecture) OLA (Open Library Architecture) OpenAccess OpenAccess For more info, see http://www.si2.org For more info, see http://www.si2.orghttp://www.si2.org Still: no reliable public STA tool Still: no reliable public STA tool Sought: OA-based utilities for timing/layout Sought: OA-based utilities for timing/layout

Acknowledgements Funding: GSRC (MARCO, SIA, DARPA) Funding: GSRC (MARCO, SIA, DARPA) Funding: IBM (2x) Funding: IBM (2x) Equipment grants: Intel (2x) and IBM Equipment grants: Intel (2x) and IBM Thanks for help and comments Thanks for help and comments Frank Johannes (TU Munich) Frank Johannes (TU Munich) Jason Cong, Joe Shinnerl, Min Xie (UCLA) Jason Cong, Joe Shinnerl, Min Xie (UCLA) Andrew Kahng (UCSD) Andrew Kahng (UCSD) Xiaojian Yang (Synplicity) Xiaojian Yang (Synplicity)

Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden.

Similar presentations

Presentation on theme: "Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden.

Similar presentations

Presentation on theme: "Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden."— Presentation transcript:

Similar presentations

About project

Feedback