Presentation is loading. Please wait.

Presentation is loading. Please wait.

Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000

Similar presentations


Presentation on theme: "Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000"— Presentation transcript:

1 Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu

2 My Research l Applied algorithmics –demonstrably useful solutions for real problems –“best known” solutions –“classic” (well-studied) : Steiner, partition, placement, TSP,... –toolkits: discrete algorithms, global optimization, mathematical programming, approximation frameworks, new-age metaheuristics, engineering l “Ground truths” –anatomies –limits

3 Anatomies l Technologies –semiconductor process roadmap, design-manufacturing I/F –design technology: methodology, flows, design process –interconnect modeling/analysis: delay/noise est, compact models l Problems –structural theory of large-scale global optimizations l Heuristics –hypergraph partitioning and clustering –wirelength- and timing-driven placement –single/multiple topology synthesis (length, delay, skew, buffering,...) –TSP,..., IP protection,..., combinatorial exchange/auction,... l Cultures –contexts and infrastructure for research and technology transfer

4 Bounds l Exact methods l Provable approximations l Technology extrapolation –achievable envelope of system implementation w.r.t. cost, speed, power, reliability,... –ideally, should drive and be driven by system architectures, design and implementation methodologies

5 Today’s Talk l “Demonstrably useful solutions for real problems” l “Valuation”: What problems require attention ? –technology extrapolation –automatic layout of phase-shifting masks l “Values”: How do we advance the leading edge ? –anatomy of FM-based hypergraph partitioning heuristics –culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”

6 Today’s Talk l “Demonstrably useful solutions for real problems” l “Valuation”: What problems require attention ? –technology extrapolation –automatic layout of phase-shifting masks l “Values”: How do we advance the leading edge ? –anatomy of FM-based hypergraph partitioning heuristics –culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”

7 Technology Extrapolation l Evaluates impact of –design technology –process technology l Evaluates impact on –achievable design –associated design problems l What matters, when ? l Sets new requirements for CAD tools and methodologies, capital and R&D investment,... right tech at the right time l Roadmaps (SIA ITRS): familiar and influential example How and when do L, SOI, SER, etc. matter? What is the most power-efficient noise management strategy? Will layout tools need to perform process simulation to effectively address cross-die and cross-wafer manufacturing variation?

8 GTX: GSRC Technology Extrapolation System l GTX is a framework for technology extrapolation Parameters (data) Rules (models) Rule chain (study) Knowledge Engine (derivation) GUI (presentation) Implementation User inputs Pre-packaged GTX

9 Graphical User Interface (GUI) l Provides user interaction l Visualization (plotting, printing, saving to file) l 4 views: –Parameters –Rules –Rule chain –Values in chain

10 GTX: Open, “Living Roadmap” l Openness in grammar, parameters and rules –easy sharing of data, models in research environment –contributions of best known models from anywhere l Allows development of proprietary models –separation between supplied (shared) and user- defined parameters / rules –usability behind firewalls –functionality for sharing results instead of data l Multi-platform (SUN Solaris, Windows, Linux) l http://vlsicad.cs.ucla.edu/GSRC/GTX/

11 GTX Activity l Models implemented –Cycle-time models of SUSPENS (with extension by Takahashi), BACPAC (Sylvester, Berkeley), Fisher (ITRS) –Currently adding –GENESYS (with help from Georgia Tech) –RIPE (with help from RPI) –New device and power modules (Synopsys / Berkeley) –New SOI device model (Synopsys / Berkeley) –Inductance extraction (Silicon Graphics / Berkeley / Synopsys) l Studies performed in GTX –Modeling and parameter sensitivity analyses –Design optimization studies: global interconnects, layer stack –Routability estimation, via impact models,...

12 Today’s Talk l “Demonstrably useful solutions for real problems” l “Valuation”: What problems require attention ? –technology extrapolation –automatic layout of phase-shifting masks l “Values”: How do we advance the leading edge ? –anatomy of FM-based hypergraph partitioning heuristics –culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”

13 Subwavelength Gap since.35  m Subwavelength Optical Lithography l EUV, X-rays, E-beams all > 10 years out l huge investment in > 30 years of optical litho infrastructure

14 Clear areas Opaque (chrome) areas Mask Types l Bright Field –opaque features –transparent background l Dark Field –transparent features –opaque background

15 Phase Shifting Masks conventional mask glass Chrome phase shifting mask Phase shifter 0 E at mask 0 0 E at wafer 0 0 I at wafer 0

16 Impact of PSM l PSM enables smaller transistor gate lengths L eff –“critical” polysilicon features only (gate L eff ) –faster device switching  faster circuits –better critical dimension (CD) control  improved parametric yield –all features on polysilicon layer, local interconnect layers –smaller die area  more $/wafer (“full-chip PSM” == BIG win) l Alternative: build a $10B fab with equipment that won’t exist for 5+ years l Data points –exponential increase in price of CAD technology for PSM –Numerical Technologies market cap 3x that of Avant! –25 nm gates (!!!) manufactured with 248nm DUV steppers (NTI + MIT Lincoln Labs, announced 2 days ago); 90nm gates in production at Motorola, Lucent (since late 1999)

17 Double-Exposure Bright-Field PSM 0 18 0 +=

18 The Phase Assignment Problem l Assign 0, 180 phase regions such that critical features with width (separation) < B are induced by adjacent phase regions with opposite phases Bright Field (Dark Field) 0180 0

19 Key: Global 2-Colorability ? 180 0 0 If there is an odd cycle of “phase implications”  layout cannot be manufactured –layout verification becomes a global, not local, issue

20 F4 F2 F3 F1 Critical features: F1,F2,F3,F4

21 F4 F2 F3 F1 Opposite- Phase Shifters (0,180)

22 F4 F2 F3 F1 S1 S2 S3 S5 S4 S6 S7S8 Shifters: S1-S8 PROPER Phase Assignment: –Opposite phases for opposite shifters –Same phase for overlapping shifters

23 F4 F2 F3 F1 S1 S2 S3 S5 S4 S6 S7S8 Phase Conflict Proper Phase Assignment is IMPOSSIBLE

24 F4 F2 F3 F1 S1 S2 S3 S5 S4 S6 S7S8 Phase Conflict feature shifting to remove overlap Phase Conflict Resolution

25 F4 F2 F1 S1 S2 S3S4 S7S8 Phase Conflict feature widening to turn conflict into non-conflict Phase Conflict Resolution F3

26 How will VLSI CAD deal with PSM ? l UCLA: first comprehensive methodology for PSM-aware layout design –currently being integrated by Cadence, Numerical Technologies l Approach: partition responsibility for phase- assignability –good layout practices (local geometry) –(open) problem: is there a set of “design rules” that guarantees phase-assignability of layout ? (no T’s, no doglegs, even fingers...) –automatic phase conflict resolution / bipartization (global colorability) –enabling reuse of layout (free composability) –problem: how can we guarantee reusability of phase-assigned layouts, such that no odd cycles can occur when the layouts are composed together in a larger layout ?

27 Automatic Conflict Resolution

28 Compaction-Oriented Approach l Analyze input layout l Find min-cost set of perturbations needed to eliminate all “odd cycles” l Induce constraints for output layout –i.e., PSM-induced (shape, spacing) constraints l Compact to get phase-assignable layout l Key: Minimize the set of new constraints, i.e., break all odd cycles in conflict graph by deleting a minimum number of edges.

29 Conflict Graph l Dark Field: build graph over feature regions –edge between two features whose separation is < B l Bright Field: build graph over shifter regions –shifters for features whose width is < B –two edge types –adjacency edge between overlapping phase regions : endpoints must have same phase –conflict edge between shifters on opposite side of critical feature: endpoints must have opposite phase

30 conflict graph G green = feature; pink = conflict Bright Field: conflict edge adjacency edge conflict graph G Conflict Graph G Dark Field:

31 Optimal Odd Cycle Elimination conflict graph G dual graph D T-join of odd-degree nodes in D dark green = feature; pink = conflict

32 Optimal Odd Cycle Elimination corresponds to broken edges in original conflict graph - assign phases: dark green and purple - remaining pink conflicts correctly handled T-join of odd-degree nodes in D dark green = feature; pink = conflict

33 The T-join Problem l How to delete minimum-cost set of edges from conflict graph G to eliminate odd cycles? l Construct geometric dual graph D = dual(G) l Find odd-degree vertices T in D l Solve the T-join problem in D: –find min-weight edge set J in D such that –all T-vertices have odd degree –all other vertices have even degree l Solution J corresponds to desired min-cost edge set in conflict graph G

34 Solving T-join in Sparse Graphs l Reduction to matching –construct a complete graph T(G) –vertices = T-vertices –edge costs = shortest-path cost –find minimum-cost perfect matching l Typical example = sparse (not always planar) graph –note that conflict graphs are sparse –#vertices = 1,000,000 –#edges  5  #vertices –# T-vertices  10% of #vertices = 100,000 l Drawback: finding APSP too slow, memory-consuming –#vertices = 100,000  #edges in T(G) = 5,000,000,000

35 Solving T-join: Reduction to Matching l Desirable properties of reduction to matching: –exact (i.e., optimal) –not much memory (say, 2-3X more) –leads to very fast solution l Solution: gadgets! –replace each edge/vertex with gadgets s.t. matching all vertices in gadgeted graph  T-join in original graph

36 T-join Problem: Reduction to Matching l replace each vertex with a chain of triangles l one more edge for T-vertices l in graph D: m = #edges, n = #vertices, t = #T l in gadgeted graph: 4m-2n-t vertices, 7m-5n-t edges l cost of red edges = original dual edge costs cost of (black) edges in triangles = 0 vertex  T vertex  T

37 Example of Gadgeted Graph Dual Graph Gadgeted graph black + red edges == min-cost perfect matching

38 Results Runtimes in CPU seconds on Sun Ultra-10 Greedy = breadth-first-search bicoloring GW = Goemans/Williamson95 heuristic Cook/Rohe98 for perfect matching Integration w/compactor: saves 9+% layout area vs. GW

39 F4 F2 F3 F1 S1 S2 S3 S5 S4 S6 S7S8 Can distinguish between use of shifting, widening DOFs

40 Black points - features Blue - shifter overlap Red - extra nodes to distinguish opposite shifters Bipartization Problem: delete min # of nodes (or edges) to make graph bipartite - blue nodes: shifting - red nodes: widening Bipartization by node deletion is NP-hard (GW98: 9/4-approx)

41 Summary l New fast, optimal algorithms for edge-deletion bipartization –Fast T-join using gadgets –applicable to any AltPSM phase conflict graphs l Approximate solution for node-deletion bipartization –Goemans-Williamson98 9/4-approximation –If node-deletion cost < 1.5 edge deletion, GW is better than edge deletion l Comprehensive integration w/NTI, Cadence tools

42 Today’s Talk l “Demonstrably useful solutions for real problems” l “Valuation”: What problems require attention ? –technology extrapolation –automatic layout of phase-shifting masks l “Values”: How do we advance the leading edge ? –anatomy of FM-based hypergraph partitioning heuristics –culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”

43 Applied Algorithmics R&D l Heuristics for hard problems l Problems have practical context l Choices dominated by engineering tradeoffs –QOR vs. resource usage, accessibility, adoptability l How do you know/show that your approach is good?

44 Hypergraphs in VLSI CAD l Circuit netlist represented by hypergraph

45 Hypergraph Partitioning in VLSI l Variants –directed/undirected hypergraphs –weighted/unweighted vertices, edges –constraints, objectives, … l Human-designed instances l Benchmarks – up to 4,000,000 vertices –sparse (vertex degree  4, hyperedge size  4) –small number of very large hyperedges l Efficiency, flexibility: KL-FM style preferred

46 Context: Top-Down VLSI Placement etc

47 Context: Top-Down Placement l Speed –6,000 cells/minute to final detailed placement –partitioning used only in top-down global placement –implied partitioning runtime: 1 second for 25,000 cells, < 30 seconds for 750,000 cells l Structure –tight balance constraint on total cell areas in partitions –widely varying cell areas –fixed terminals (pads, terminal propagation, etc.)

48 Fiduccia-Mattheyses (FM) Approach l Pass: –start with all vertices free to move (unlocked) –label each possible move with immediate change in cost that it causes (gain) –iteratively select and execute a move with highest gain, lock the moving vertex (i.e., cannot move again during the pass), and update affected gains –best solution seen during the pass is adopted as starting solution for next pass l FM: –start with some initial solution –perform passes until a pass fails to improve solution quality

49 Cut During One Pass (Bipartitioning) Moves Cut

50 Multilevel Partitioning RefinementClustering

51 Key Elements of FM l Three main operations –computation of initial gain values at beginning of pass –retrieval of the best-gain (feasible) move –update of all affected gain values after a move is made l Contribution of Fiduccia and Mattheyses: –circuit hypergraphs are sparse –move gain is bounded between +2 *, -2 * max vertex degree –hash moves by gains (gain bucket structure) –each gain affected by a move is updated in constant time –linear time complexity per pass

52 Taxonomy of Algorithm and Implementation Improvements l Modifications of the algorithm l Implicit decisions l Tuning that can change the result l Tuning that cannot change the result

53 Modifications of the Algorithm l Important changes to flow, new steps/features –lookahead tie-breaking –CLIP –instead of actual gain, maintain “updated gain” = actual gain minus initial gain (at start of pass) –WHY ??? –cut-line refinement –insert nodes into gain structure only if incident to cut nets –multiple unlocking

54 Modifications of the Algorithm l Important changes to flow, new steps/features –lookahead tie-breaking –CLIP –instead of actual gain, maintain “updated gain” = actual gain minus initial gain –promotes “clustered moves” (similar to “LIFO gain buckets”) –cut-line refinement –insert nodes into gain structure only if incident to cut nets –multiple unlocking

55 Implicit Decisions l Tie-breaking in choosing highest gain bucket l Tie-breaking in where to attach new element in gain bucket –LIFO vs. FIFO vs. random... (known issue: HK 95) l Whether to update, or skip updating, when “delta gain” of a move is zero l Tie-breaking when selecting the best solution seen during pass –first encountered, last encountered, best-balance,...

56 Tuning That Can Change the Result l Threshold large nets to reduce runtime l Skip gain update for large nets l Skip zero delta gain updates –changes resolution of hash collisions in gain container l Loose/stable net removal –perform gain updates for only selected nets l Allow illegal solutions during pass

57 Tuning That Can’t Change the Result l Skip updates for nets that cannot have non-zero delta gain l netcut-specific optimizations l 2-way specific optimizations l optimizations for nets of small degree l..... l... 30 years since KL70, 18 years since FM82, 100’s of papers in literature

58 Zero Delta Gain Update l When vertex x is moved, gains for all vertices y on nets incident to x must potentially be updated l In all FM implementations, this is done by going through incident nets one at a time, computing changes in gain for vertices y on these nets l Implicit decision: –reinsert a vertex y when it experiences a zero delta gain move (will shift position of y within the same gain bucket) –skip the gain update (leave position of y unchanged)

59 Tie-Breaking Between Highest-Gain Buckets l Gain container typically implemented such that available moves are segregated, e.g., by source or destination partition l There can be more than one highest-gain bucket l When balance constraint is anything other than “exact bisection”, moves at multiple highest-gain buckets can be legal l Implicit decision: –choose the move that is from the same partition as the last vertex moved (“toward”) –choose the move that is not from the same partition as the last vertex moved (“away”) –choose the move in partition 0 (“part0”)

60 How Much Can This Matter ? l 5% ? l 10% ? l 20% ? l more ? l 50% ? l more ?

61 Implicit Decision Effects: IBM01

62 Effect of Implicit Decisions l Stunning average cutsize difference for flat partitioner with worst vs. best combination –far outweighs “new improvements” l One wrong decision can lead to misleading conclusions w.r.t. other decisions –“part0” is worse than “toward” with zero delta gain updates –better or same without zero delta gain updates l Stronger optimization engines mask flaws –ML CLIP > ML LIFO > Flat CLIP > Flat LIFO –less dynamic range  ML masks bad flat implementation

63 Tuning Effects l Comparison of two CLIP-FM implementation l Min and Ave cutsizes from 100 single-start trials l Another quiz: Why did this happen ? –N.B.: original inventor of CLIP-FM couldn’t figure it out

64 Tuning Effects l Comparison of two CLIP-FM implementation l Min and Ave cutsizes from 100 single-start trials l Another quiz: Why did this happen ? –Hint: some modern IBM benchmarks have large macro-cells

65 Sheer Nightmare Stuff... l Comparison of two LIFO-FM implementations l Min and Ave cut sizes from 100 single-start trials l Papers 1, 2 both published since mid-1998

66 In Case You Are Wondering... No, VLSI CAD Researchers Are Not Stupid.

67 How Much Can This Matter ? l 5% ? l 10% ? l 20% ? l more ? l 50% ? l more ? l Answer: 400+ % 2000+% w.r.t. recent literature and STANDARD, “WELL-UNDERSTOOD” heuristics l + lots more + N years = leading partitioner, placer

68 Today’s Talk l “Demonstrably useful solutions for real problems” l “Valuation”: What problems require attention ? –technology extrapolation –automatic layout of phase-shifting masks l “Values”: How do we advance the leading edge ? –anatomy of FM-based hypergraph partitioning heuristics –culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”

69 "Barriers to Entry” for Researchers l Code development barrier –bare-bones self-contained partitioner: 800 lines –not leading-edge (Dutt/Deng LIFO-FM) –modern partitioner requires much more code l Expertise barrier –very small details can have stunning impact –must not only know what to do, but also what not to do –impossible to estimate knowledge/expertise required to do research at leading edge l Need reference implementations ! –reference prose (6 pp. 9pt double-column) insufficient

70 “Barriers to Relevance” for Researchers l All heuristic engines/algorithms tuned to test cases l Test case usage must capture real use models, driving applications –e.g., recall bipartitioning is driven by top-down placement –until CKM99: no one considered effect of fixed vertices !!! l Test case usage can be fatally flawed by “details” –hidden or previously unrealized –previously believed insignificant –results of algorithm research will be flawed as a result

71 l Research in mature areas can stall –incremental research - difficult and risky –implementations not available  duplicated effort –too much trust  which approach is really the best? –some results may not be replicable –‘not novel’ is common reason for paper rejection –exploratory research - paradoxically, lower-risk –novelty for the sake of novelty –yet, novel approaches must be well-substantiated l Pitfalls: questionable value, roadblocks, obsolete contexts Challenges for Applied Algorithmics

72 l Difficult to be relevant (time-to-market, QOR issues) –time to market: 5-7 year delay from publishing to first industrial use (cf. market lifetimes, tech extrapolation...) –quality of results: unmeasurable, unpredictable, basically unknown l Good news: barriers to entry and barriers to relevance are self-inflicted, and possibly curable –mature domains require mature R&D methodologies –a possible solution: cultivate flexibility and reuse –low cost “update” of previous work to support reuse –future tool/algorithm development biased towards reuse

73 Analogy: Hardware Design :: Tool Design l Hardware design is difficult –complex electrical engineering and optimization problems –mistakes are costly –verification and test not trivial –few can afford to truly exploit the limits of technology –A Winning Approach: Hardware IP reuse l CAD tools design is difficult –complex software engineering and optimization problems –mistakes can be showstoppers –verification and test not trivial –few can manage complexity of leading-edge approaches –A "Surprising Idea”: CAD-IP reuse

74 What is CAD-IP? l Data models and benchmarks –context descriptions and use models –testcases and good solutions l Algorithms and algorithm analyses –mathematical formulations –comparison and evaluation methodologies for algorithms –executables and source code of implementations –leading-edge performance results l Traditional (paper-based) publications

75 Bookshelf: A Repository for CAD-IP l “Community memory” for CAD-IP –data models –algorithms –implementations l Publication medium that enables efficient applied algorithmics algorithm research –benchmarks, performance results –algorithm descriptions and analyses –quality implementations (e.g., open-source Capo, MLPart) l Simplified comparisons to identify best approaches l Easier for industry to communicate new use models

76 Summary: Addressing Inefficiencies l Inefficiencies –lack of openness and standards  huge duplication of effort –incomparable reporting  improvement difficult –lack of standard comparison/latest use models  best approach not clear –industry doesn’t bother w/feedback  outdated use models l Proposed solutions –widely available, up-to-date, extensible benchmarks –standardized performance reporting for leading-edge approaches –available detailed descriptions of algorithms –peer review of executables (and source code?) –credit for quality implementations l Better research, faster adoption, more impact l http://vlsicad.cs.ucla.edu/GSRC/bookshelf/

77 Today’s Talk l “Demonstrably useful solutions for real problems” l “Valuation”: What problems require attention ? –technology extrapolation –automatic layout of phase-shifting masks l “Values”: How do we advance the leading edge ? –anatomy of FM-based hypergraph partitioning heuristics –culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse” l Thank you for your attention !!!

78 Spare Slides

79 Parameters l Description of technology, circuit and design attributes l Importance of consistent naming cannot be overstated –Naming conventions for parameters [ ] _ _ {[qualifier] _ } _ { } _ [ ] _ [ ] _ [ ] –Example: r_int_tot_lyr_pu_dl –Benefits: –Relatively easy to understand parameter from its name –Distinguishable (no two parameters should have the same name) –r_int (interconnect resistance) = r_int (interconnect resistivity) ? –Unique (no two names for the same parameter) –R_int = R_wire ? –Sortable (important literals come first) –Software to automatically check parameter naming

80 Rules l Methods to derive unknown parameters from known ones ASCII rules –Laws of physics, models of electrical behavior –Statistical models (e.g., Rent's rule) –Include closed-form expressions, vector operations, tables –Storing of calibration data (e.g., “technology files”) for known process, design points in lookup tables Constraints –Simulated by rules that compute boolean values –Used to limit range during “sweeping” –Optimization over a collection of rules –Example: buffer insertion for minimal delay with area constraints

81 Rules (Cont.) l “External executable” rules –Assume a callable executable (e.g., PERL script) –Example: optimization of number and size of repeaters for global wires –Use command-line interface and transfer through files –Allow complex semantics of a rule –Example: placers, IPEM executable [Cong, UCLA]) l “Code” rules –Implemented in C++ and linked into the inference engine –Useful if execution speed is an issue

82 Engine l Contains no domain-specific knowledge l Evaluates rules in topological order Performs studies (multiple evaluations  tradeoffs/sweeping, optimization) Parameters (data) Rules (models) Rule chain (study) Knowledge Engine (derivation) GUI (presentation) Implementation User inputs Pre-packaged GTX

83 Knowledge Representation l Rules and parameters are specified separately from the derivation engine l Human-readable ASCII grammar l Benefits : –Easy creation/sharing of parameters/rules by multiple users –D. Sylvester and C. Cao: device and power, SOI modules that “drop in” to GTX –P.K. Nag: Yield modeling –Extensible to models of arbitrary complexity (specialized prediction methods, technology data sets, optimization engines) –Avant! Apollo or Cadence SE P&R tool: just another wirelength estimator –Applies to any domain of work in semiconductors, VLSI CAD –Transistor sizing, single wire optimizations, system-level wiring predictions,…

84 Corking Effect in CLIP l CLIP begins by placing all moves into the 0-gain buckets –CLIP chooses moves by cumulative delta gain (“updated gain”) –initially, every move has cumulative delta gain = 0 l Historical legacy (and for speed): FM partitioners typically look only at the first move in a bucket –if it is illegal, skip the rest of the bucket (possibly skip all buckets for that partition) l If the move at the head of each bucket at the beginning of a CLIP pass is illegal, pass terminates without making any moves –even if first move is legal, an illegal move soon afterward will “cork” l New test cases (IBM) have large cells –large cells have large degree, and often large initial gain –CLIP inventor couldn’t understand bad performance on IBM cases

85 Tuning to Uncork CLIP l Don’t place nodes with area > balance constraint in gain container at pass initialization –actually useful for all FM variants –zero CPU overhead l Look beyond the first move in a bucket –extremely expensive –hurts quality (partitioner doesn’t operate well near balance tolerance –not worth it, in our experience l Simply do a LIFO pass before starting CLIP –spreads out nodes in gain buckets –reduces likelihood that large node has largest total gain

86 Effect of Fixed Terminals Normalized Cost for IBM01Runtime for IBM01

87 Enabling Reuse: Free Composability

88 Conflict in Cell (Macro) Based Layouts l Consider connected components of conflict graphs within each cell master –each component independently phase-assignable (2 k versions) –each is a single “vertex” in coarse-grain conflict graph –problem: assure free composability (reusability) of cell masters, such that no odd cycles can arise in coarse-grain conflict graph edge in coarse-grain conflict graph cell master Acell master B connected component

89 Case I: Creating CAD IP of Questionable Value l Recent hypergraph partitioning papers report FM implementations 20x worse than leading-edge FM –previous lack of openness caused wrong conclusions, wasted effort –some “improvements” may only apply to weak implementations –duplicated effort re-implementing (incorrectly?) well-known algorithms –difficult to find the leading edge –no standard comparison methodology –how do you know if an implementation is poor? l To make leading-edge apparent and reproducible –publish performance results on standard benchmarks –peer review (executables, source code?) –similar to common publication standards !

90 Case II: Roadblocks to Creating Needed CAD-IP l “Best approach” to global placement? –recursive bisection (1970s) –force-directed (1980s) –simulated annealing (1980s) –analytical (1990s) –hybrids, others l Why is this question difficult? –lastest public placement benchmarks are from 1980s –data formats are bulky (hard to mix and match components) –no public implementations since early 1990s –new ideas are not compared to old l To match approaches to new contexts –agree on common up-to-date data model –publish good format descriptions, benchmarks, performance results –publish implementations

91 Case III: Developing CAD-IP for Obsolete Contexts l Global placement example –much of academia studies variable-die placement –row length and spacing not fixed –explicit feedthroughs –majority of industrial use is fixed-die –pre-defined layout dimensions –HPWL-driven vs. routability- or timing-driven –runtimes are often not even reported –this affects benchmarks and algorithms l Solution: perform sanity checks and request feedback –explicitly define use model and QOR measures –establish a repository for up-to-date formats, benchmarks etc. –peer review (executables, source code?)

92 Implicit Decision Effects: IBM02

93 Reference Implementations l Documentation does not allow replication of results –amazingly, true even for "classic" algorithms –true for vendor R&D, true for academic R&D l Published reference implementations will raise quality –minimum standard for algorithm implementation quality –reduce barrier to entry for new R&D

94 Conclusions l Work with mature heuristics requires mature methodologies l Identified research methodology risks l Identified reporting methodology risks l Community needs to adopt standards for both –reference “benchmark” implementations –vigilant awareness of use-model and context –reporting method that facilitates comparison

95 Application-Driven Research l Well-studied areas have complex, "tuned" metaheuristics l Risks of poor research methodologies –irreproducible results or descriptions –no enabling account of key insights underlying the contribution –experimental evidence not useful to others –inconsistent with driving use model –missing comparisons with leading-edge approaches –Let’s look at some requirements this induces...

96 The GSRC Bookshelf for CAD-IP l Bookshelf consists of slots –slots represent active research areas with “enough customers” –collectively, the slots cover the field l Who maintains slots? –experts in each topic collaborate to produce them - anyone can submit l Currently, 10 active slots –SAT (U. Michigan, Sakallah) –Graph Coloring (UCLA, Potkonjak) –Hypergraph Partitioning (UCLA, Kahng) –Block Packing (UCSC, Dai) –Placement (UCLA, Kahng) –Global Routing (SUNY Binghamton, Madden) –Single Interconnect Tree Synthesis (UIC, Lillis and UCLA, Cong) –Commitments for more: BDDs, NLP, Test and Verification

97 What’s in a Slot? l Introduction –why this area is important and recent progress –pointers to other resources (links, publications) l Data formats used for benchmarks –SAT, graph formats etc. –new XML-based formats l Benchmarks, solutions, performance results –including experimental methodology (e.g., runtime-quality Pareto curve) l Binary utilities –format converters, instance generators, solution evaluators, legality checkers –optimizers and solvers –executables l Implementation source code l Other info relevant to algorithm research and implementations –detailed algorithm descriptions –algorithm comparisons

98 Current Progress on the CAD-IP Bookshelf l Bookshelf@gigascale.org –33 members (17 developers) l Main policies and mechanisms published l 10 active slots –inc. executables, performance results for leading-edge partitioners, placers l First Bookshelf Workshop, Nov. 1999 –attendance: UCSC, UCB, NWU, UIC, SUNY Binghamton, UCLA –agreed on abstract syntax and semantics for initial slots –committed to XML for common data formats –peer review of slot webpages l Ongoing research uses components in the Bookshelf


Download ppt "Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000"

Similar presentations


Ads by Google