Presentation is loading. Please wait.

Presentation is loading. Please wait.

Avoiding Metastability in FPGA Devices MAPLD 2009 David Landoll Applications Architect Mentor Graphics Corp.

Similar presentations


Presentation on theme: "Avoiding Metastability in FPGA Devices MAPLD 2009 David Landoll Applications Architect Mentor Graphics Corp."— Presentation transcript:

1 Avoiding Metastability in FPGA Devices MAPLD 2009 David Landoll Applications Architect Mentor Graphics Corp.

2 Today’s FPGAs DMA Controller UART CPU AMBA AHB Fabrication advances provide more available silicon area More functionality can weigh less and take up less space Integrating/reusing capabilities lowers cost Fabrication advances provide more available silicon area More functionality can weigh less and take up less space Integrating/reusing capabilities lowers cost 2

3 Flight Management Maintenance Diagnostics Image Processing Integrated Avionics Processing Flight Control & Avoidance Systems Boeing 787: Integration’s Next Step From its central processor to its common data network, surveillance system and navigation system, the theme of the Boeing 787 Dreamliner is integration. James W. Ramsey Communications Weather Radar 3 Integration Presents New Challenges Such integration usually involves multiple independent clock domains, which leads to clock-domain crossings and metastability errors!

4 Clock Domain Crossing (CDC) Errors Unpredictable Loss of Data ■ CDC problems — corrupt control and data signals — are subtle, intermittent, unpredictable — are the 2 nd major cause of respins — are difficult to reproduce and debug — are temperature, voltage, and process sensitive — will only occur in hardware; often in the final design ■ Traditional verification techniques do not work for CDC signals 4 A CDC Verification methodology is needed to reduce the risk of CDC related data errors 4

5 Metastability What the heck is it, anyway? ■ What is a clock? — Periodic pulsing signal — Digital logic uniformly connected to this signal — Acts as the Symphony Conductor – keeps logic in sync — Action happens across the logic at one specific point ■ Typically the “rising edge” 55 Vee, Vss: GND: 0V Vcc, Vdd : +5V, +3.3V

6 Metastability What the heck is it, anyway? ■ What’s in a register? — (Also known as a latch, flip-flop, etc) — Contain transistors that “trap” the input value at the appropriate time ■ E.g. rising edge of the clock — How does this happen? 66

7 Metastability The Physics of a Register ■ Let’s take a look at a register — CMOS D-type transmission gate flip-flop 7 CLK D Q D Q 0 0 0 -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; 7 Transistor Model of a D Flip-Flop

8 Metastability The Physics of a Register ■ Let’s take a look at a register — CMOS D-type transmission gate flip-flop 8 CLK D Q D Q 1 0 0 -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; 8 Transistor Model of a D Flip-Flop

9 Metastability The Physics of a Register ■ Let’s take a look at a register — CMOS D-type transmission gate flip-flop 9 CLK D Q D Q 1 1 1 -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; 9 Transistor Model of a D Flip-Flop

10 Transistor Model of a D flip-flop Metastability The Physics of a Register ■ Let’s take a look at a register — CMOS D-type transmission gate flip-flop 10 CLK D Q D Q 0 0 1 Only works if D has a “good value” at the rising edge of the clock (no Set-up/hold time violations) Only works if D has a “good value” at the rising edge of the clock (no Set-up/hold time violations) -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; -- simple D-type flip-flop process(CLK) begin if rising_edge(CLK) then Q <= D; end if; end process; 10

11 Metastability The Physics of a Register ■ When setup/hold conditions are violated, the output of a storage element becomes unpredictable ■ This effect is called metastability ■ If not contained, metastability can propagate… 11 f clk = Clock Frequency f in = Input Signal Frequency t d = Duration of critical time window f clk = Clock Frequency f in = Input Signal Frequency t d = Duration of critical time window CLK D Q D Q Setup/hold window Metastability is UNAVOIDABLE in designs with multiple asynchronous clocks 11

12 Clock Domain Crossing signal Clock Domain Crossings Guaranteed to Cause Metastability When 2 or more designs run on disparate clocks: — The clocks will continually skew, guaranteeing setup/hold violations — Signals from one design to another are “Clock Domain Crossings” (CDCs) 12 D CLK Q Sensor SystemGuidance System Tx Clock B Clock A Setup/hold window Signals that cross asynchronous clock domains (CDC signals) WILL violate setup and hold conditions 12 D CLK Q

13 Mitigating Clock Domain Crossing Issues Problem: — Signals crossing a clock domain will violate set-up/hold — Impact: Control/data signals will be dropped/corrupted ■ Loss of Data Approaches: — Avoid having systems that have multiple clocks ■ Although sensible, it’s becoming impossible — Design around the problem ■ Designer can add “synchronizers” to the design ■ Metastability still happens, but nobody else sees it — E.g. 2DFF, FIFO, etc. — “Fences in” metastability 13

14 Isolate Metastability: Synchronizers ■ Designers add synchronizers to reduce the probability of metastable signals ■ Synchronizers are sub-circuits that can prevent metastable values from being sampled across clock domains — Take unpredictable metastable signals and create predictable behavior 14

15 Mitigating Clock Domain Crossing Issues Isolate Metastability: Synchronizers 15 Q Clock A Clock B i i +1 i +2 i -1 i i +1 i +2 i -1 Tx Metastability window When metastability occurs, the delay through a synchronizer becomes unpredictable Rx i i +1 i +2 i -1 i +3 15

16 Invalid Command with unexpected results Synchronizer Delays Can Reconverge with unexpected results ■ CDC signals cross with an assumed relationship ■ Can be combinational, sequential, or deeply sequential ■ Unpredictable delays on CDC paths lead to reconvergence errors — Designs need logic to correctly handle reconvergence — Can occur on single-bit or multiple-bit signals 16 S1 S2 S3S4 FSM Input tx_d0 tx_d2 tx_d1 Sync 2 Sync 1 Sync 0 0 0 00 1 0 0 0 0 0 1 0 Valid Command – but delayed Grey Encoder Grey Decoder 0 0 01 1 1 0 0 01 0 11 1 1 Sync 1 0 0 0 1 1 1 16

17 And, Synchronizers Fail if Misused ■ Synchronization between clock domains requires a transfer protocol — Ensures data is predictably transferred between domains ■ These protocols must be verified ■ When protocol is violated — Data is lost — Simulation may not show a failure — Silicon will eventually show a functional error Synchronizer won’t function properly if the required Transfer Protocol is violated 17

18 Verification Must Cover All Three CDC Problems 18 Clock domain crossings need: — Structured synchronization — Transfer protocols — Global reconvergence checking Clock domain crossings need: — Structured synchronization — Transfer protocols — Global reconvergence checking Reconvergence problem Missing sync problem Possible protocol problem 18

19 Mitigating Clock Domain Crossing Issues ■ Problem: — Signals crossing a clock domain will violate set-up/hold — Impact: Control/data signals will be dropped/corrupted ■ Approaches: — Avoid having systems that have multiple clocks — Designer can add “synchronizers” to the design — Designer-added synchronizers + full CDC verification ■ Assures synchronizers are present and used correctly 19

20 Recommendations During design planning 1. Create systems/designs using 1 clk, 1 edge when possible 2. If multiple clocks are required, try to use 1 designer for both clock domains, and use coding guidelines 1. Use signal naming conventions 2. Many clock domain errors come from design changes, not the initial design 3. Limit “clock domain crossings” to specific areas or blocks in the design, when possible. 4. NOTE: These techniques can help assure synchronizers are present, but are unlikely to help identify reconvergence or CDC protocol issues. 3. When multi-clock design is required, plan for proper verification 1. How to we accomplish this? 20 My_sig_Tx My_data_Tx_Reg My_data_Rx_sync1 My_data_Rx_sync2 Comb_sig_Rx For Example: Append “_A_reg” to signals leaving A-clk register, “_A” for A-clk combo signals Leverage during code reviews - help identify missing synchronizers Make sure ONLY _A_reg signals go to synchronizers (no combo logic) For Example: Append “_A_reg” to signals leaving A-clk register, “_A” for A-clk combo signals Leverage during code reviews - help identify missing synchronizers Make sure ONLY _A_reg signals go to synchronizers (no combo logic)

21 Verifying CDC Synchronization ■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized signals can create unexpected behavior ■ Approaches: — Simulation ■ Digital logic simulators do NOT model transistor behavior ■ Do not model “metastability” 21

22 For example … 22 Q D CLK Simulation captures a ‘1’ while silicon produces either a ‘1’ or ‘0’ Setup Violation Q D CLK Hold Violation Simulation Does NOT Reflect Silicon Behavior Q in silicon Simulation captures a ‘0’ while silicon produces either a ‘1’ or ‘0’ 22 Q in simulation

23 Verifying CDC Synchronization ■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized  Control logic bugs ■ Approaches: — Simulation ■ Won’t model CDC’s correctly to detect errors — Static Timing Analysis ■ Can be used to identify signals that cross domains ■ Can be used as input for a manual review ■ But…Won’t detect missing or incorrectly used synchronizers, or reconvergence 23

24 Verifying CDC Synchronization ■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized  Control logic bugs ■ Approaches: — Simulation ■ Won’t model CDC’s correctly to detect errors — Static Timing Analysis ■ Identifies signals for manual review, but otherwise useless — Manual Design Reviews ■ Error prone (and very time consuming) ■ Typically only identifies synchronizer structures, misses reconvergence and invalid sync protocol usage ■ Evidence suggests at least some synchronizers will be missed 24

25 For Example… Trivial Reconvergence Error ■ Reconverging synchronized CDC signals - timing is unpredictable. ■ Need to verify the downstream logic can handle variations — Manually identifying the reconvergence is very hard — Manually identifying all possible behaviors is harder — Manually assuring logic will behave correctly – typically intractable 25

26 Verifying CDC Synchronization ■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized  Control logic bugs ■ Approaches: — Simulation - Won’t model CDC’s correctly to detect errors — Timing Analysis - Identifies signals for review, but otherwise useless — Manual Design Reviews - error prone, incomplete — Lab Verification? ■ Problem is intermittent, debug is impossible — Spice simulation? – It *does* model transistors, but… ■ Where will you get the “Spice deck”? (transistor level model) ■ Would be far too slow on a large FPGA 26

27 Verifying CDC Synchronization ■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized  Control logic bugs ■ Approaches: — So - we need a new method that reliably: ■ Identifies ALL CDC signals, structures, reconvergence ■ Assures ALL connected, functioning correctly ■ Creates reports for manual reviews ■  The EDA industry has responded — 6 commercial tools now available…and counting — But…most won’t identify all 3 of our CDC issues 27

28 Mentor’s CDC Verification Technology Who’s using our technology? ■ Mil-Aero — Honeywell, Inc. — L-3 Communications — Lockheed Martin Co — Ministry of Aerospace & Aeronautics — Northrop Grumman Corp — Raytheon — Rockwell Collins Inc. — SAAB Group — Thales ■ Commercial — Widely used in commercial space The market leader in CDC verification ■ The market leader in CDC verification 28

29 Example Value from One Customer ■ Design — IEEE standard serial communications core — Used in 50-60 other COMMERCIAL ASIC products — Widely deployed (millions in use daily) ■ Placed core in a sensor guidance system — Found issues in the lab — Debugged FPGA for weeks — Suspected a CDC issue, but not sure… ■ Deployed Mentor’s CDC solution — Results same day — Found 199 serious CDC bugs! ■ 45 Missing Synchronizers ■ 83 Incorrect Synchronizers ■ 76 Reconverging Signals ■ 11 other problems — Most resulting from “more stressful” usage ■ In production: — Commercial ASIC : Customer issue – device is erratic, locks up — Avionics: Could result in an Airworthiness Directive 29

30 Summary Recommendations During design planning 1. Create systems/designs using 1 clk, 1 edge when possible 2. If multiple clocks are required, try to use 1 designer for all clock domains 3. When multi-clock design is required, plan for proper verification During verification 1. Watch for multiple clocks in designs (Tip – Count PLLs) 2. Ask how CDC issues are mitigated (remember there are 3) Utilize commercial tools designed for detecting these problems 1. Verify all 3 classes of CDC problems 1. Structural Verification 2. Protocols Verification 3. Reconvergance Verification 2. Use reports to aid manual reviews 3. Use CDC tools to support ROBUSTNESS 30

31 In Conclusion … ■ Every multi-clock design is subject to metastability ■ Traditional verification methodologies CANNOT assure robustness ■ To properly mitigate the dangers of CDC, we strongly recommend a solution that… : — Supports Manual Reviews — Automatically reports all sources of CDC problems — Has a proven CDC verification methodology & customer success 31

32 32


Download ppt "Avoiding Metastability in FPGA Devices MAPLD 2009 David Landoll Applications Architect Mentor Graphics Corp."

Similar presentations


Ads by Google