Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lou Scheffer Cadence San Jose, CA

Similar presentations


Presentation on theme: "Lou Scheffer Cadence San Jose, CA"— Presentation transcript:

1 Lou Scheffer Cadence San Jose, CA Lou@cadence.com
Timing Closure Today Lou Scheffer Cadence San Jose, CA Hangzhou, April 2002 Lou Scheffer

2 Timing Closure Today Timing more accurate as flow progresses
Design Entry Timing more accurate as flow progresses Sometimes an earlier stage thinks timing is OK, but it fails a later stage Need to repeat one or more steps with tighter constraints We have a timing closure problem when this process fails. Symptoms include: Non-convergence Too many iterations Solution achievable, but this flow cannot find it. Synthesis Timing Place Timing Route Timing Hangzhou, April 2002 Lou Scheffer

3 The Timing Closure Problem
Hangzhou Lou Scheffer

4 Examples of Problems Design Worst slack / # misses Cycle time Tech
Synthesis Placed C1 -1 / 2000 -12 / 38k 7.5 ns .25 µm V1 0 / 0 -12 / 15k .18 µm T1 -0.5 / 2000 -48 / 164k ns P1 -0.4 / 100 -97 / 43k 8 ns V2 -0.5 / 500 -11 / 2000 Hangzhou, April 2002 Lou Scheffer

5 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

6 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Timing Analysis Give accurate time values on each pin/port of the network Has to deal with design changes in optimization toolbox Static Timing Analysis Simulation far too slow in optimization environment Accuracy is more than enough For use with optimization, we have some particular requirements of the timer Design changes are happening at a tremendous pace – thousands, hundreds of thousands, or millions of changes Only static timing is fast enough Hangzhou Lou Scheffer DAC A.D. Drumm

7 Timing Analysis Requirements
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Timing Analysis Requirements Choose combination of timing analyzer and delay calculator which are appropriate for level of design give the best accuracy for performance that can be tolerated Timing Analysis / Delay calculation must be able to cope with logic design changes Incremental Highest performance possible Non-linear delay models Absolute accuracy not needed. Precision without accuracy only adds run-time. Thousands to millions of changes may be made. Performance very important. But need enough accuracy to make the right decisions. Hangzhou Lou Scheffer DAC A.D. Drumm

8 Timing Analysis Requirements
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Timing Analysis Requirements Must handle… Difference between rising and falling delays Delay dependent on slew rate Slew and delay dependent on output load Non-linear delay equations Need pin-to-pin delay equations. Rising/falling delay differences may be substantial. Slews very important for proper delay modeling and affect noise. Hangzhou Lou Scheffer DAC A.D. Drumm

9 Late Mode Analysis Definitions
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Late Mode Analysis Definitions a y x b c Constraints: assertions at the boundaries Arrival times: ATa, ATb Required arrival time: RATx Delay from a to x is the longest time it takes to propagate a signal from a to x Slack is required arrival time - arrival time. See example on next page. Hangzhou Lou Scheffer DAC A.D. Drumm

10 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Example a y 1 b 1 x c AT propagation shown but not RAT. Slack at c is +1. Using extremely simple delays for this example. Hangzhou Lou Scheffer DAC A.D. Drumm

11 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Early mode analysis Definitions change as follows longest becomes shortest slack = arrival – required Not as important since early violations are easier to fix a Similar to earlier example. Early mode violations occur when there is a hold time at latch and/or overlap of L1/L2 clocks (either by design or due to skew/uncertainties). y 1 1 x b c Hangzhou Lou Scheffer DAC A.D. Drumm

12 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Delay modeling cl d o a b x Propagation Arcs This shows pin-to-pin delay arcs and tests at a latch. Timing graph has nodes at pins and arcs between them where a timing path – or test - exists. Test Arc Timing Model Hangzhou Lou Scheffer DAC A.D. Drumm

13 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Timing Correction Overview Approaches to Fixing Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

14 Traditional Design Flows
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Traditional Design Flows Tech independent optimization Tech mapping Rudimentary timing correction Design Entry Mid 1980's Synthesis Timing Place Timing This represents older flow, prior to integration of timing with synthesis. Timing correction used unit delays, optimized for arrival time only. Rudimentary and not really useful for VLSI design. Route Timing Hangzhou Lou Scheffer DAC A.D. Drumm

15 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Logic Synthesis Technology independent optimization General goal: reduce connections, literals, redundancies, area Technology mapping Map logic into technology library Timing correction added next Find and fix critical timing paths Fix electrical violations (load, slew) General logic synthesis flow has remained fairly stable for years… TI Opt: Boolean opt, algebraic opt, 2-level opt, global flow, testgen-based opt and redundancy removal Tech Map: forest of trees, wavefront mapping Timing correction: late mode and early mode, see later slides Hangzhou Lou Scheffer DAC A.D. Drumm

16 Traditional Design Flows
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Traditional Design Flows Design Entry 1990's Tech independent optimization Tech mapping Timing correction Synthesis w/Timing Place w/Timing Route Timing correction here could use real static timing and optimize slacks. There may be some timing-sense to other operations such as technology mapping. Timing Integrate timing with synthesis and placement Hangzhou Lou Scheffer DAC A.D. Drumm

17 Traditional Design Flows
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Traditional Design Flows Design Entry Tech independent optimization Tech mapping Placement Timing Correction 2000's Synthesis/Placement w/Timing Global Route Detailed Route Timing correction here could use real static timing and optimize slacks. There may be some timing-sense to other operations such as technology mapping. Timing Integrate timing with synthesis and placement Hangzhou Lou Scheffer DAC A.D. Drumm

18 Traditional Design Flows
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Traditional Design Flows Design Entry Tech independent optimization Tech mapping Placement Timing Correction Global route 2001 Synthesis and Placement w/Timing and Global route Detailed Route Timing correction here could use real static timing and optimize slacks. There may be some timing-sense to other operations such as technology mapping. Timing Integrate timing with synthesis, placement and global route Hangzhou Lou Scheffer DAC A.D. Drumm

19 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

20 The Quest for Synthesis and Layout Timing Closure
8 June 2000 The Wall Logic designers concentrate on logic and timing (as understood by synthesis) Design work done in abstract world Was gates and wire load models Now may include placement and global route Throw design over the wall when complete Physical designers concentrate on layout and ability to route Effective method for many years Physical effects demanding more attention by logic designers. Hangzhou Lou Scheffer DAC A.D. Drumm

21 The Quest for Synthesis and Layout Timing Closure
8 June 2000 General CMOS Problems Low drive strengths / low power Capacitance (not intrinsic delay) plays a large role in performance Huge variability – range between slowest possible and fastest possible Noise affects delay IR drop a big percentage of supply Crosstalk can change delay by a factor of 2 CMOS brought with it many problems along with good properties. Variability is one of the toughest to cope with. It’s not uncommon for fastest possible delay to be half or less than slowest possible delay. What other engineering discipline has this sort of range? Hangzhou Lou Scheffer DAC A.D. Drumm

22 Additional DSM Problems
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Additional DSM Problems High density / huge designs Very thin and resistive wires Very high frequencies Inductance becomes more important Smaller voltages IR drop a bigger fraction of signal swing Clock skew and latency Electromigration and noise Huge designs too big for many design tools and too big for designers to consider as a whole. Even if signals can be planned into localized regions, clocks span entire design. Clock logic becoming large percentage of area and accounts for most of the power use. Extent to which these affect a particular design depend on the design characteristics – how hard it is pushing cycle time, how large, how dense. Hangzhou Lou Scheffer DAC A.D. Drumm

23 Clock Distribution Problems
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Clock Distribution Problems Most common design approach requires close to zero skew CMOS / DSM problems all affect clocks Distribution problem increasing Number of latches/flip-flops growing significantly Power consumed in clock tree significant I and noise also of concern Clocks consume area, large amount of available wire, and power. Hangzhou Lou Scheffer DAC A.D. Drumm

24 Process Designers are trying to help
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Process Designers are trying to help Many metal layers Different metal pitches Small pitch for local interconnect Big pitch/thick metal for long, fast wires Copper wires, thick metal to lower R SOI – Silicon On Insulator Low k dielectrics These help but are not enough Hierarchy can help us keep the problem size manageable. Technology improvements push some problems out in time. Copper wires lessen the RC problem for now. Dielectric changes, SOI, etc help. Timing analysis has to consider voltage/temp/manufacturing conditions. Noise analysis needs 3D extraction. Hangzhou Lou Scheffer DAC A.D. Drumm

25 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

26 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Timing Correction Fix electrical violations (slew and load). Takes priority since needed for reliability. Resize cells Buffer nets Copy (clone) cells Fix timing problems Local transforms (bag of tricks) Path-based transforms Load/slew are estimated using wire-load models. Usually proceed right-to-left through the logic when correcting. Timing problems: continued on next slide Hangzhou Lou Scheffer DAC A.D. Drumm

27 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Local Transforms Resize cells Buffer or clone to reduce load on critical nets Decompose large cells Swap connections on commutative pins or among equivalent nets Move critical signals forward Pad early paths Area recovery Trial and error approach commonly used: make the change, observe the result, undo the change if not good. Various methods used to traverse the logic design – focus on cells in critical paths, focus on all critical logic, try to find points which affect the most paths, … Hangzhou Lou Scheffer DAC A.D. Drumm

28 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Transform Example ….. Double Inverter Removal Delay = 4 Usually simplest example of a logic transform. Used to reduce area and to improve delay. Delay = 2 Hangzhou Lou Scheffer DAC A.D. Drumm

29 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Resizing b a d e f 0.2 0.3 ? b a A 0.035 Simplest and most important of timing correction methods. b a C 0.026 Hangzhou Lou Scheffer DAC A.D. Drumm

30 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Cloning b a d e f g h 0.2 ? b a d e f g h A B Create a copy of a circuit to share load. Can also isolate critical sinks Hangzhou Lou Scheffer DAC A.D. Drumm

31 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Buffering b a d e f g h 0.1 0.2 B b a d e f g h 0.2 ? Create a new, non-inverting circuit to share load. Buffer can be – often is – done using dual inverters. Hangzhou Lou Scheffer DAC A.D. Drumm

32 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Redesign Fan-in Tree a c d b e Arr(b)=3 Arr(c)=1 Arr(d)=0 Arr(a)=4 Arr(e)=6 1 Note that this transform requires more logic knowledge of the tool. True restructuring method. c d e Arr(e)=5 1 b a Hangzhou Lou Scheffer DAC A.D. Drumm

33 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Redesign Fan-out Tree 1 3 Longest Path = 5 1 3 2 Longest Path = 4 Slowdown of buffer due to load Balancing between load and delay. Hangzhou Lou Scheffer DAC A.D. Drumm

34 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Decomposition Another true restructuring method. Balancing between delay and area. Larger cells (AND-OR) are often smaller since first stage can be wimpy. Hangzhou Lou Scheffer DAC A.D. Drumm

35 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Swap Commutative Pins a c b 2 1 5 Simple Sorting on arrival times and delay works c a b 2 1 3 Rearrange signals arriving at commutative pins to match timing better. 2 Hangzhou Lou Scheffer DAC A.D. Drumm

36 Move Critical Signals Forward
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Move Critical Signals Forward a b c d e Based on ATPG linear in circuit size Detects redundancies efficiently Efficiently find wires to be added and remove. Based on mandatory assignments. a b e d c Boolean reasoning – may be ATPG, BDDs, Global Flow, transduction, … May be simple structurally based methods. Hangzhou Lou Scheffer DAC A.D. Drumm

37 Path-based Transforms
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Path-based Transforms Path-based resizing Unmap / remap a path or cone Slack stealing Retiming Resizing can be optimally solved for a single path with no fan-outs. Can do pretty well for real paths. May wish to remap logic in critical path/cone/region using better timing information (I.e. all logic around the region has been mapped to technology already). Do this by extracting the logic and working on it apart from rest of circuit. Slack stealing and Retiming addressed next. Hangzhou Lou Scheffer DAC A.D. Drumm

38 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Slack Stealing Take advantage of timing behavior of level sensitive registers (latches) C1 C2 1 2 C1 C2 Slack = +1 Slack = -1 One cycle from rise or fall of a clock until next rise or fall of same clock. Level sensitive latches start switching at rise of clock, don’t latch value until fall of clock. If signals are early, they pass through and are available at next latch before the clock has fallen. Clock falling edge is normally considered “launch” time. In effect, this may provide as much as the full active time of the clock in extra available delay as shown by lowest green arrow. No change to logic! C1 C2 Slack = 0 Hangzhou Lou Scheffer DAC A.D. Drumm

39 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Retiming Delay=3 Forward Backward Problem: Verification Delay=2 Signals at outputs not affected by where the latches are or how many – only the number of cycles from inputs to the output. Can move latches through logic and maintain output functions, but may have to increase or decrease the number of latches. A more aggressive optimization since it changes the function Hangzhou Lou Scheffer DAC A.D. Drumm

40 Solutions to Timing Closure
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Solutions to Timing Closure Carry hierarchical logic design into physical Hand / Custom design Improved analysis More sophisticated clock design Modify existing flows More physically knowledgeable tools Many variations: combined synthesis/place/route, gain based synthesis, etc. Hierarchy can help us keep the problem size manageable. Technology improvements push some problems out in time. Copper wires lessen the RC problem for now. Dielectric changes, SOI, etc help. Timing analysis has to consider voltage/temp/manufacturing conditions. Noise analysis needs 3D extraction. Hangzhou Lou Scheffer DAC A.D. Drumm

41 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Analysis Methods Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

42 Hierarchy and Physical Design
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Hierarchy and Physical Design Logical hierarchy can be carried over into physical design Seems natural top-down approach, using floorplanning as a firm guide to physical design Use of hierarchy offers many advantages and many possible problems A new generation of tools for this problem Makes sense from a logic design perspective. Hangzhou Lou Scheffer DAC A.D. Drumm

43 Pin Assignment and Timing Budgeting
Block 1 Block 3 Block 2 L Each block requires: Content definition Partitioning Pin locations Clock/timing definition Set_input_delay Set_output_delay Set_drive Set_load Path exceptions (false, multicycle paths) Hangzhou Lou Scheffer

44 Hierarchy and Physical Design Advantages…
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Hierarchy and Physical Design Advantages… Run time of P&R tools Blocks can be built independently Early (and valuable) knowledge of global wires Limited wire delay within macro may allows simpler methodologies Contains the problem size Extends naturally to SOC and mixed A/D chips May be the only real method available Placement and routing are difficult, time consuming jobs. Run time for flat design can be excruciating. Waiting until P&R to determine location of long wires results in lots of surprises. See Keutzer article in Spectrum (or was it Computer?). Hangzhou Lou Scheffer DAC A.D. Drumm

45 Physical Hierarchy Disadvantages
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Physical Hierarchy Disadvantages Possible to overconstrain the design in many ways (see next slide) Hierarchy usually logic-based, not physically-based Designed for logical correctness, not physical implementation Good logical hierarchy makes sense to designer – helps with design and simulation tasks. But it doesn’t usually translate into layout. Global optimization tools work best when they have the entire picture. Someone needs to manage the global wires, their delays, and the associated assertions at each macro to ensure consistency. Hangzhou Lou Scheffer DAC A.D. Drumm

46 Physical Hierarchy Overconstraints
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Physical Hierarchy Overconstraints Placement solution perhaps overconstrained Logical gates may not fit naturally in a rectangle Ability to find a routable solution hindered Can’t detour through neighboring cell Boundary conditions explode and must be managed carefully to avoid surprises A recent IBM design had 17,000 top level connections. A bad timing constraint on any one can make the whole design infeasible Good logical hierarchy makes sense to designer – helps with design and simulation tasks. But it doesn’t usually translate into layout. Global optimization tools work best when they have the entire picture. Someone needs to manage the global wires, their delays, and the associated assertions at each macro to ensure consistency. Hangzhou Lou Scheffer DAC A.D. Drumm

47 Hierarchy Example Plots
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Hierarchy Example Plots RS/6000 S80 Processor MHz 64 bit, 0.18 micron, Cu interconnect Notice various hard-bounded macros which themselves are flattened logical hierarchy Red areas – decoupling caps as fillers for noise Hangzhou Lou Scheffer DAC A.D. Drumm

48 Hierarchy Example Plots
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Hierarchy Example Plots Very large ASIC >600,000 placeable objects Large arrays Hangzhou Lou Scheffer DAC A.D. Drumm

49 Hierarchy Example Plots
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Hierarchy Example Plots Another flat design. Much higher percentage of logic (vs arrays) than previous examples. Hangzhou Lou Scheffer DAC A.D. Drumm

50 The Challenges How to derive sensible partitioning?
How to achieve die utilization similar to “flat” approach? How to achieve clock speed and skews similar to “flat” approach? How to automatically generate optimal pin assignments for each module? How to automatically come up with realistic timing budgets for each module? Hangzhou Lou Scheffer

51 Basic Approach to solution
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Basic Approach to solution Example tool – First Encounter Start with a Silicon Virtual Prototype A near final quality ‘flat’ placement Near legal routing Known feasible solution for timing and routability Use this solution to guide the final implemention Partitioning, pin assignment, timing constraints Build the blocks with more detailed tools. RTL / gates Silicon Virtual Prototype hierarchical partitioning and placement Top Level Top level buffering, clock balancing, and power grid Physical synthesis / placement and routing Block Level Now I want to talk about our directions with SOC Encounter. We believe the silicon virtual prototype concept is central to being able to construct very large designs – it enables design teams to predict and control their back end implementation again. Detailed synthesis, place and route is so complex and timing consuming on the new high-end chips that having an accurate, full-chip prototype in advance is a major advantage in reaching design closure. Chip assembly routing GDSII Hangzhou Lou Scheffer DAC A.D. Drumm

52 Basic Approach continued
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Basic Approach continued Logic Design: RTL, gates, IP, “black box” Physical Data Accurate timing and routability data in hours instead of days or weeks IP Hand off a floorplan OR full placement a = b + c netlist const. .lib Placement Scan Opt. Trial Route RC Extract Delay Calc STA Clock Tree IPO Power Silicon Virtual Prototype Physical Prototype Complete “flat” physical design (proves timing and routability) VERY FAST Full-Chip Physical Prototype Confidence the design will work once the blocks are re-assembled into the complete IC Block-Level Physical Synthesis and/or Route Prototype really useful for two things – < step through the flow here > And as was mentioned in the previous presentation, we generate the prototype really, really fast. This is essential for big designs. Hangzhou Lou Scheffer DAC A.D. Drumm

53 In-Context Hierarchical Partitioning
Pin assignment Timing budgeting Clock tree generation Power grid planning Independent block-level implementation Partitioning SoC assembly Hangzhou Lou Scheffer

54 In-Context Pin Assignment
Accurate Physical Prototype Flat Full-Chip Top Level Partition View Full-chip prototype results in optimal pin placement Results in narrower channels and reduced die size Reduces the routing congestion Improves the chip timing Hangzhou Lou Scheffer

55 In Context Timing Budgeting
Block 1 Block 3 Block 2 L Each block requires: Clock definition Set_input_delay Set_output_delay Set_drive Set_load Path exceptions (false, multicycle paths) Accurate timing budgets result in predictable timing convergence Hangzhou Lou Scheffer

56 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Analysis Methods Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

57 Blocks have timing closure problems, too
Didn’t the big flat placement guarantee blocks are feasible? No, because Block may not have been defined when global constraints were set Global placer does not deal with all DSM effects Block may be too hard for the relatively simple global placer (which must be very fast) Requirements change as project progresses Process technology may have changed ….. Hangzhou Lou Scheffer

58 Hand/Custom Design Mentioned for completeness
Hurts productivity Yields highest performance Can only fix a few things – for example: Can realistically fix timing or crosstalk problems on a few nets Cannot realistically change the size of blocks Hangzhou Lou Scheffer

59 Improved Analysis Helps
Plot shows slack by net for two designs A 10% timing delta -> many more bad nets Often the difference between success and failure Hangzhou Lou Scheffer

60 More accurate analysis
Crosstalk induced delay Old approach – overestimate coupling C Better – compute nominal timing + xtalk delta Customer example from CadMos Ignore crosstalk completely 400 MHz Not an acceptable alternative Coupling Caps overestimated by 60% 300 MHz Nominal delays + computed crosstalk 333 MHz More accurate analysis gains 10% margin Hangzhou Lou Scheffer

61 Increased accuracy helps
Global/detailed route correlation Any global route better than Wire Load Models or Steiner trees, since global routes consider congestion But to get that last 10%, need global/detailed router link Knowing some nets must detour is good, but…. Which net takes which detour is needed for good correlation Hangzhou Lou Scheffer

62 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Modified clock design Zero skew is not necessary, and often not even desirable We have the freedom to adjust clock arrival times at memory elements This obtains more margin and thus helps convergence Similar to retiming but less disruptive Improvement very design dependent If worst path is flip-flop to itself, doesn’t help May impact scan chains Lots of opportunity to improve the design. Hangzhou Lou Scheffer DAC A.D. Drumm

63 Previous attempts to fix block closure
Without the radical step of combining synthesis and placement, designers have tried: Allow placer to do sizing and buffering Do post placement optimization Simple transformations Use existing placement Do post placement re-synthesis Complex transformations allowed Needs incremental placement and extraction But these have not been fully successful Why? Re-examine the root cause of discrepancies Wire load models and their limitations Combined Synthesis/Placement/Routing Hangzhou Lou Scheffer

64 Post-Placement Optimization
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Post-Placement Optimization Design Entry Synthesis w/Timing Place In-place optimizations Minimally disturb placement optimizations Re-run Synthesis w/Timing After placement, net delays can be estimated much more accurately using Steiner trees and Elmore delays. Use logic synthesis techniques here, limited to those which have little or no effect on the layout, to fix timing and electrical violations due to difference between pre-P&R estimates and actual physical layout. Route Timing Hangzhou Lou Scheffer DAC A.D. Drumm

65 Post-Placement Optimization
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Post-Placement Optimization In-place (little or no placement impact) Resizing (carefully) Pin swapping, some tree rebuilding Wire sizing / typing Minimally disruptive Resizing Buffering Cloning Tree rebuilding Cell removal Resizing can grow a cell causing overlaps or overflowing a region’s area so must be careful. Small disruptions in placement (overflows, new cells) can be effectively handled by good P&R tools. Hangzhou Lou Scheffer DAC A.D. Drumm

66 In-place Optimization
The Quest for Synthesis and Layout Timing Closure 8 June 2000 In-place Optimization Not too difficult Can use extracted electrical data (C, RC) from placement tool Some changes affect pin locations, but may be ignored Tree rebuilding needs incremental extraction Can use timing reports for timing data But, accuracy suffers as changes are made Real RC data replaced by estimates again Biggest advantage is these methods are not terribly difficult. Just using timing reports is really insufficient at current level of technology. Hangzhou Lou Scheffer DAC A.D. Drumm

67 In-place Optimization
The Quest for Synthesis and Layout Timing Closure 8 June 2000 In-place Optimization Resize swap pins rebuild trees Placed netlist Optimization Placement & extraction C/RC data In-place optimization flow. Simplest and fairly effective. Cannot solve problem of long nets (RC). Opt’d netlist Hangzhou Lou Scheffer DAC A.D. Drumm

68 Place-disruptive Optimization
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Place-disruptive Optimization Nets changing implies… Must be able to recompute C and RC May need to incrementally place new cells Need incremental timing capability Next step is to allow some bigger changes – adding new cells such as buffering. Cannot use extracted data for nets which are changed substantially. Hangzhou Lou Scheffer DAC A.D. Drumm

69 Place-disruptive Optimization
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Place-disruptive Optimization Resize buffer clone cell removal rebuild trees Placed netlist Optimization with placer, timer, extractor Placement & extraction C/RC data Must add placer, timer, and an extractor to the optimizer. Payoff is much more effective bag of tricks. Opt’d netlist Hangzhou Lou Scheffer DAC A.D. Drumm

70 The Quest for Synthesis and Layout Timing Closure
8 June 2000 What are the problems? Getting the timing right Different timers used at different stages Do the optimizer and placer see the same worst paths as the static timer? Design size / tool capacity Using synthesis technology on flat designs Post-placement optimization brings several challenges that must be met. Hangzhou Lou Scheffer DAC A.D. Drumm

71 The Quest for Synthesis and Layout Timing Closure
8 June 2000 More problems Incompatible tools, formats Placer, synthesizer, timer may all use different file format, may all be different vendors Basic interoperability issues Incremental placer needed for new cells Doesn’t have to be smart But might produce some infeasible solutions Must be integrated with optimizer Compatibility is often trouble even within a single vendor’s tool set. Need at least a simple placer for new cells. Designs with lots of blockages require placer at least avoid those areas. Hangzhou Lou Scheffer DAC A.D. Drumm

72 Still more challenges/problems
The Quest for Synthesis and Layout Timing Closure Still more challenges/problems 8 June 2000 Extraction/Estimation of net data Any optimization which significantly alters net topology needs this ability Insert cells Remove cells Move connections from one cell to another Steiner tree estimation Net C and delay (RC) calculator Do results match detail router and other extraction tools? How well do results correlate to sign-off results? Steiner trees may vary quite a bit. Hangzhou Lou Scheffer DAC A.D. Drumm

73 Sample Optimization Results
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Sample Optimization Results Design Worst slack / # misses Cycle time Tech Synthesized Placed Opt C1 -1 / 2000 -12 / 38k -2 / 1400 7.5 ns .25 µm V1 0 / 0 -12 / 15k -0.3 / 100 .18 µm T1 -0.5 / 2000 -48 / 164k -6 / 62k ns P1 -0.4 / 100 -97 / 43k -13 / 20k 8 ns V2 -0.5 / 500 -11 / 2000 -4 / 1000 All these designs were placed flat First two designs were final parts Last three designs were early – required changes at design (e.g. VHDL) level Notice comparatively small number of misses for first two designs – easily fixed in physical design Hangzhou Lou Scheffer DAC A.D. Drumm

74 Root Problem is Wire Load Models
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Root Problem is Wire Load Models Main problem: correlation between Pre-P&R estimates and Post-P&R extraction If correlation is good… Problems detected and potentially fixed early If correlation is bad… Problems detected late Not a good situation! Need to re-write RTL is worst case for timing closure. Hierarchy helps with correlation problem. Hangzhou Lou Scheffer DAC A.D. Drumm

75 Why are Wire Load Models Used?
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Why are Wire Load Models Used? Can’t complete layout until logic design is complete Can’t complete logic design without timing Can’t time without load and net delay data Can’t extract load and net delay data until layout is complete Can’t complete layout … The great circle or chicken-and-egg problem. Hangzhou Lou Scheffer DAC A.D. Drumm

76 WLM solution – use statistics
The Quest for Synthesis and Layout Timing Closure 8 June 2000 WLM solution – use statistics Don’t know specific layout data But we know something about statistical properties Average net load, average net delay Further refine using other characteristics Number of sinks Size of design (number of circuits) Physical size Given a technology, its general characteristics can be estimated. Technology house can adjust these data once a few real designs have been processed. Characteristics of the logic design can be used to tune the estimates to more closely model a specific design. Hangzhou Lou Scheffer DAC A.D. Drumm

77 Correlation Pre/Post-P&R using averages
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Correlation Pre/Post-P&R using averages Wire load models give synthesis an estimate of physical design We can correlate averages pre- and post-P&R as accurately as needed If specific design has average behavior, its timing, on average, can be predicted Otherwise, a pass through placement can provide correct WLM for a design, and get the averages right Wire load models specify load generally based on number of sinks and number of cells in the design. A pass through P&R can verify if these averages are met. If not, adjust wire load model to ensure average correlation. Hangzhou Lou Scheffer DAC A.D. Drumm

78 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Timing and averages WLMs OK for area, power (properties that are sums are well handled by statistics) But, timing dictated by the worst specific path That path is built of individual nets One net can determine the speed of an entire design Reality: poor correlation for relatively few nets can cause major headaches If it was just one net, we could cope with that. But it’s a problem when the number of long nets is small wrt entire design yet large wrt what we can handle manually. Hangzhou Lou Scheffer DAC A.D. Drumm

79 Correlation Pre/Post-P&R Averages and Wire Loads
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Correlation Pre/Post-P&R Averages and Wire Loads median mean Most nets are short. The average wire load per fan-out results in pessimistic delays for most nets, optimistic for some, very optimistic for a few. Those few tend to be the ones affecting overall performance. Note the very long tails of this distribution Hangzhou Lou Scheffer DAC A.D. Drumm

80 Correlation Pre/Post-P&R Cwire Data by Logic Design
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Correlation Pre/Post-P&R Cwire Data by Logic Design Cwire Y axis is load, X axis is number of sinks. All using same technology. Expect to see (hope to see) nice straight line. Reality – all over the graph. Correlation poor. Number of fan-outs Hangzhou Lou Scheffer DAC A.D. Drumm

81 Better Wire Load Models
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Better Wire Load Models How can we use information from one pass through physical design? Adjust wire load model coefficients Back annotate specific net load and delay data to the logic design New problem: correlation of logic pre- and post-synthesis But, there are fundamental limits to statistical models – a new approach is needed. Synthesis makes DSM design possible, but correlation of a signal in high level design to a net in physical design may be impossible. Nets are merged, subsumed, removed, created by synthesis. Hangzhou Lou Scheffer DAC A.D. Drumm

82 A better (but harder) approach: Combine Synthesis, P & R
Don’t use wire load models at all Synthesis does a trial placement as it runs Loading found from estimated routes For best results, must include global routing Then, feed global route to detailed router Or, do detailed route itself Much better correlation and timing closure No inter-tool data transfer headaches Hangzhou Lou Scheffer

83 Example of Combined SP&R
Video Graphics Engine 160k instances 70 macros (blocks) 5 layers, 0.18 micron Target freq: 100Mhz Hangzhou Lou Scheffer

84 Conventional Flow More than 20 Iterations
DC Func. & Timing .lib Synthesis More than 20 Iterations 89MHz best result w/manual changes PT Static Timing syn2GCF SE Placement base optimization Floorplan DEF Global route Func. & Timing .TLF Detail route Physical LEF Extraction Pearl Delay calc Hangzhou Lou Scheffer DRC

85 Combined SP&R Flow 100MHz final result, met timing
Correlation within % One pass 12hrs 20min runtime EDIF netlist PT Static Timing SE-PKS write_constraints PKS Optimization Global Route Static Timing TCL Constraints Floorplan DEF Func. & Timing .TLF Detail route Physical LEF HE Extraction Pearl Delay calc DRC Hangzhou Lou Scheffer

86 Slack Correlation PKS Routed Wire Load Based Hangzhou Lou Scheffer

87 Enlargement of SP&R slack
Hangzhou Lou Scheffer

88 Results from combined SP&R
Case size macros PKS timing max freq (MHz) instances (k) error (%) conventional SP&R % % % % Hangzhou Lou Scheffer

89 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Traditional design flows Summary of DSM Problems Analysis Methods Overview Correction Methods Overview Approaches to Fixing Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

90 Experimental Results For Hierarchical design, two objectives
Should be faster and higher capacity than a fully detailed flat design, to find problems earlier Resulting partitions should be realizable For Block Design, compare different strategies Can overconstrain clock or wire models Can do IPO or not after placing Can allow placer to change size or not Can test combined synthesis/placement against running the two tools separately Hangzhou Lou Scheffer

91 Hierarchical Experimental results
Design 580K cells, 0.25um process, 5LM, 100MHz Data collected on a 500MHz processor workstation Resulting blocks were realizable (*) SPC Trial Route First Encounter Flow Traditional Flow 7 hr 30 min 9 hr 5 hr 25 min 35 hr 40 min 5 hr 45 min 2 hr 50 min 3 hr 50 min 1 hr 50 min 3 hr 20 min 6x 2 hr 15 min 4 hr 20 min 1x 5x 4 min 8 min 6 min 7 min 7x 60x 56x 57x 33x Design Import Detail Place Detail Route* RC Extract Delay Calculation Timing Analysis Design Iteration IPO Hangzhou Lou Scheffer

92 How do different block design approaches compare?
Jay McDougal of Agilent ran many flows on the same design Overconstrain clock by various amounts Accurate or conservative WLMs Tried many levels of conservatism Allow placer to size or not Do post placement optimization or not Physically knowledgeable synthesis Hangzhou Lou Scheffer

93 Characteristics of sample design
Design not very difficult ColdFire processor 80K instances 0.25 micron library 5 layer process, not congestion dominated Design goal was 180 MHz, known to be possible with this design 85% of delay in gates; 15% in interconnect 0.18/0.13 micron, bigger designs will show bigger differences between techniques Hangzhou Lou Scheffer

94 Key to the plot of results
Basic flow – Design Compiler & QPlace TDD = timing driven design In addition to minimizing wire length and congestion, placer is given timing constraints and allowed to change gate sizes IPO and PBO are post placement optimizers IPO – runs on synthesis DB with back annotation PBO – runs on physical DB with synthesis transforms PKS = Physically Knowledgeable Synthesis (combined Synthesis/Place/Route) Hangzhou Lou Scheffer

95 Comparison of Approaches
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Comparison of Approaches Required Cycle time Spread between approaches becomes bigger for more challenging designs and more DSM.Req Hangzhou Lou Scheffer DAC A.D. Drumm

96 Comparison of Approaches
The Quest for Synthesis and Layout Timing Closure 8 June 2000 Comparison of Approaches Good area, but iterates between placement and synthesis, worst TTM, didn’t hit timing target One tool, no iteration, better TTM, hit timing target Spread between approaches becomes bigger for more challenging designs and more DSM.Req Hangzhou Lou Scheffer DAC A.D. Drumm

97 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Agenda Traditional design flows Summary of DSM Problems Analysis Methods Overview Correction Methods Overview Approaches to Fixing Timing Closure Experimental Results Summary First – talk about how designs are done without logical/physical interaction Then – brief discussion of timing analysis, a key for timing correction Next – automatic transformations to correct timing Then – outline some of the problems arising today And – how we can address those problems by doing some simple optimizations after placement Finally – sum up Hangzhou Lou Scheffer DAC A.D. Drumm

98 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Good News At least we understand the problem Analysis of timing is well understood Transformations that help timing are well understood DSM effects are painful but can be controlled The rule – first 90% for 10% cost, next 10% for 90% cost. Hangzhou Lou Scheffer DAC A.D. Drumm

99 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Bad News Cycle time and technology advances demand more and more sophisticated optimization techniques In previous flows, corrections must be applied in separate tools Disconnects among various tools involved increases turn-around-time and limits optimization No let-up in sight. Hangzhou Lou Scheffer DAC A.D. Drumm

100 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Good News The Bad News is commonly recognized Many tool vendors, academics, in-house EDA researchers are working to solve these problems A new generation of tools is already available that was designed from the ground up to address timing closure Hierarchical and block design DAC is full of vendors working on these problems (or claiming victory!). Hangzhou Lou Scheffer DAC A.D. Drumm

101 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Bad News These problems won’t be the last! Each process generation brings new problems Increased size Weird process rules (antenna) Possible new effects (single event upset) Good news: full employment! Hangzhou Lou Scheffer DAC A.D. Drumm

102 The Quest for Synthesis and Layout Timing Closure
8 June 2000 Summary Timing closure is a very real problem Large chips need hierarchical tools to help with partitioning and budgeting Block tools must understand synthesis, placement and routing wire load models have serious limitations Best approach is combined synthesis/P&R Experimental data backs this up The DSM problems everyone is talking about are real. We have methods to attack some of them and we have good ideas of future direction. Hangzhou Lou Scheffer DAC A.D. Drumm

103 Acknowledgements Tony Drumm wrote the original set of slides for this lecture, including many of the examples. He credits: Alex Suess José Neves Bill Joyner IBM Rochester EDA folks But the conclusions, and any mistakes, are mine Hangzhou Lou Scheffer

104 The Quest for Synthesis and Layout Timing Closure
8 June 2000 The End Good Luck! Hangzhou Lou Scheffer DAC A.D. Drumm


Download ppt "Lou Scheffer Cadence San Jose, CA"

Similar presentations


Ads by Google