Presentation is loading. Please wait.

Presentation is loading. Please wait.

BL-TMR AND M ITIGATION A PPROACHES FOR FPGA S Mike Wirthlin BYU.

Similar presentations


Presentation on theme: "BL-TMR AND M ITIGATION A PPROACHES FOR FPGA S Mike Wirthlin BYU."— Presentation transcript:

1 BL-TMR AND M ITIGATION A PPROACHES FOR FPGA S Mike Wirthlin BYU

2 1. TMR Overview

3 Triple Modular Redundancy (TMR) A form of N Modular Redundancy –Triplicate hardware resources –Majority Vote on hardware outputs Tolerates any single fault –Tolerates many multiple fault combinations Mike Wirthlin, BYU

4 TMR Granularity System LevelDevice Level Logic LevelModule Level Mike Wirthlin, BYU RTL Level process(clk_int_a) begin if clk_int_a'event and clk_int_a='1' then locked_d_a <= locked_a_int; if (all_locked_a = '0') then all_locked_a <= (locked_d_a and locked_d_b and locked_d_c); else all_locked_a <= tmr_voter( locked_d_a, locked_d_b, locked_d_c); end if; end process

5 TMR Reliability TMR has lower reliability than non- redundant for long mission times Effective TMR almost always is coupled with “repair” Non-redundant TMR Mike Wirthlin, BYU

6 TMR + Repair = Very Reliable! Mike Wirthlin, BYU

7 x Configuration Upset FPGA Configuration “Repair” Mike Wirthlin, BYU

8 x Configuration Upset Repaired FPGA Configuration “Repair” Mike Wirthlin, BYU

9 TMR & Scrubbing Example Mike Wirthlin, BYU

10 Voters Before Flip Flops Mike Wirthlin, BYU

11 Voters After Flip-Flops Mike Wirthlin, BYU

12 More Frequent Voting Mike Wirthlin, BYU

13 TMR Synchronization Fault repair through scrubbing –Fixes the cause of the error –Does NOT fix the state of the circuit State of circuit must be synchronized to working circuits Mike Wirthlin, BYU

14 Synchronizing Voters Mike Wirthlin, BYU

15 Synchronizing Voters Mike Wirthlin, BYU

16 Clock Domain Crossing Mike Wirthlin, BYU

17 Partial TMR TMR may be applied selectively –Failures in some circuit areas cause more harm than others –Some circuit areas are protected by other SEE mitigation techniques (TMR not needed) Challenge: deciding where to apply TMR –Circuits with feedback (state machines) –Circuits with high “functional influence” Mike Wirthlin, BYU

18 Persistent vs. Non-persistent Upset Non-Persistent Upset time cycle error magnitude Upset Correct Output Bitstream Repair Upset Bitstream Repair Incorrect Output Persistent Upset time cycle error magnitude Some upsets repaired through scrubbing –Non-persistent upsets: repairable through scrubbing –Persistent upsets: requires reconfiguration

19 Non-Persistent Structure – Feed-forward Persistent Structures – Contribute to feedback Partial TMR – Priority given to persistent structures FF Logic FF Logic Persistent Circuit Structures Mike Wirthlin, BYU

20 Full TMR

21 Partial TMR Mike Wirthlin, BYU

22 TMR Automation TMR is relatively easy to automate –Analyze design –Replicate resources –Insert voters –Verify resulting circuit Different Strategies for Automated TMR –Netlist level –HDL Level –Selective/Partial Several tools available for Automatic TMR Mike Wirthlin, BYU

23 Automated TMR Tools BL-TMR Mike Wirthlin, BYU (and other several other academic projects)

24 2. BL-TMR

25 BL-TMR BYU-LANL TMR Tool –BYU-LANL Triple Modular Redundancy –Developed at BYU under the support of Los Alamos National Laboratory (Cibola Flight Experiment) –Used to test TMR on many designs Fault injection, Radiation testing, in Orbit –Testbed for experimenting with various TMR application techniques (used for research) Mike Wirthlin, BYU

26 Ongoing Development Based on the success of BL-TMR, additional funding has been provided to extend BL-TMR for additional devices, environments, and address new problems –Commercial companies concerned about SER rates Cisco Systems –High Energy Physics Brookhaven National Laboratory (BNL), CERN –Space system developers SEAKR systems, Sandia, LANL, Lockheed Martin Interest in BL-TMR is growing –Commercialization currently under consideration

27 EDIF data structure & API –Parse, represent, and manipulate EDIF Available tools: –EDIF parser –Half-latch removal –SRL replacement –Feedback cutset tool –Full and partial TMR –Detection circuitry insertion –EDIF output Project size –~50 Java packages –350+ Java classes –478,401 lines of code –Includes contributions from CHREC member LANL BL-TMR (BYU/LANL TMR) [brian@tiger:test] java -cp ~/jars/BLTmr.jar byucc.edif.tools.tmr.FlattenTMR../no_tmr/synth/counters80.edf -- removeHL --full_tmr --technology virtex -p xcv1000fg680 --log counters80.log BLTmr Tool version 0.2.3, 12 Oct 2006 Search for EDIF files in these directories: [.] Parsing file../no_tmr/synth/counters80.edf Removing half-latches... Flattening Flattened circuit contains 3451 primitives, 3461 nets, and 13692 net connections Processing: ASUF 1.0 Forcing triplication of instance safeConstantCell_zero Analyzing design... Full TMR requested. Triplicating design... domainreport=BLTmr_domain_report.txt Added 1931 voters. 3431 instances out of 3451 cells triplicated (99% coverage) 6862 new instances added to design. 3431 nets triplicated (6862 new nets added). 0 ports triplicated. Tools and code available at: http://sourceforge.net/projects/byuediftools/ Mike Wirthlin, BYU

28 BL-TMR User Control Provides significant control to user Can be scripted for complex BL-TMR runs Usage: java byucc.edif.tools.tmr.FlattenTMR [(-o|--output) ] [(-d|--dir) dir1,dir2,...,dirN ] [(-f|--file) file1,file2,...,fileN ] [--tmrSuffix suffix1,suffix2,...,suffixN ] [--full_tmr] [--tmr_inports] [--tmr_outports] [--no_tmr_p port1,port2,...,portN ] [--tmr_c cell_type1,cell_type2,...,cell_typeN ] [--tmr_i cell_instance1,cell_instance2,...,cell_instanceN ] [--no_tmr_c cell_type1,cell_type2,...,cell_typeN ] [--no_tmr_i cell_instance1,cell_instance2,...,cell_instanceN ] [--notmrFeedback] [--notmrInputToFeedback] [--notmrFeedBackOutput] [--notmrFeedForward] [--noInoutCheck] [--SCCSortType ] [--doSCCDecomposition] [--inputAdditionType ] [--outputAdditionType ] [--mergeFactor ] [--optimizationFactor ] [--factorType ] [--factorValue ] [--low ] [--high ] [--inc ] [--removeHL] [--hlConst ] [--hlUsePort ] [--technology ] [(-p|--part) ] [--summary] [--log ] [--domainReport ] [--writeConfig[: ]] [-h|--help] [-v|--version] For detailed usage, try `--help'

29 Sample Execution [brian@tiger:test] java -cp ~/jars/BLTmr.jar byucc.edif.tools.tmr.FlattenTMR../no_tmr/synth/counters80.edf --removeHL --full_tmr --technology virtex -p xcv1000fg680 --log counters80.log BLTmr Tool version 0.2.3, 12 Oct 2006 Search for EDIF files in these directories: [.] Parsing file../no_tmr/synth/counters80.edf Removing half-latches... Flattening Flattened circuit contains 3451 primitives, 3461 nets, and 13692 net connections Processing: ASUF 1.0 Forcing triplication of instance safeConstantCell_zero Analyzing design... Full TMR requested. Triplicating design... domainreport=BLTmr_domain_report.txt Added 1931 voters. 3431 instances out of 3451 cells triplicated (99% coverage) 6862 new instances added to design. 3431 nets triplicated (6862 new nets added). 0 ports triplicated.

30 Cost of TMR Size Increase Critical Path Before TMR Critical Path After TMR % Increase in Critical Path blowfish3.1X28.3 ns31.7 ns12.0% des33.4X11.1 ns13.6 ns22.5% qpsk3.1X80.0 ns83.9 ns4.9% free65023.3X29.6 ns33.1 ns11.8% T803.3X27.8 ns33.7 ns21.2% macfir3.9X14.4 ns19.5 ns35.4% serial_divide4.1X9.2 ns12.2 ns32.6% planet3.1X10.9 ns12.6 ns15.6% s14883.1X9.9 ns12.0 ns21.2% s14943.1X10.4 ns12.2 ns17.3% s2983.1X15.8 ns19.1 ns20.9% tbk3.9X10.3 ns12.9 ns25.2% synthetic4.0X9.9 ns10.4 ns5.1% lfsrs6.3X9.0 ns12.7 ns41.1% ssra_core3.5X6.1 ns7.2 ns18.0% mean3.6X8.17 ns12.08 ns16.0% Mike Wirthlin, BYU

31 BL-TMR Incremental Results Mike Wirthlin, BYU

32 3. Design Flow

33 Design Flow RTL Synthesis RTL EDIF Netlist pTMR Tool Modified Netlist Xilinx Map, Par, etc. FPGA bitfile pTMR Property Tags Tagged EDIF Netlist Signal List pTMR Parameters

34 pTMR Steps 1.Component Merging 2.Design Flattening 3.Graph Creation and Analysis 4.IOB Analysis 5.Clock Domain Analysis 6.Instance Removal 7.Feedback Analysis 8.Illegal Crossing identification 9.TMR Prioritization & Selection 10.Voter Selection 11.Netlist generation

35 11. Netlist Generation Circuit generated from pTMR rules –Cells triplicated –Voters inserted Netlist created for new circuit

36 3. Verifying BL-TMR

37 FPGA 1 FPGA 2 Comparator Configure user design onto two identical FPGAs Compare results of two designs using Comparator FPGA Insert configuration SEUs into design under test (FPGA2) and compare results If discrepancies between FPGAs are found, record configuration error Fault Injection Mike Wirthlin, BYU

38 SEU Insertion Example #1 FPGA 1 FPGA 2 Comparator x Insert configuration SEU into FPGA #2 Apply test vector to circuit input x FPGA1  FPGA2 x Compare circuit results Mike Wirthlin, BYU

39 Unmitigated Experimental Results – Design #2 Synthetic (LFSR/Mult) 3,005 slices (24%)254,840 (4.39%)46,368 (0.80%) Full TMR Applied 12,165 slices (99%)2,395 (0.041%) 671 (0.005%) FPGA Editor Layout Sensitivity Map Persistence Map Mike Wirthlin, BYU

40 LANL Cibola Flight Experiment Cibola Flight Experiment 560 km, 35.4º inclination  Los Alamos National Laboratory technology pathfinder  validate FPGAs for high performance computing  Investigate SEU behavior of Xilinx Virtex FPGAs  Several BYU experiments validated in orbit  TMR (including BL-TMR tool)  Duplication with Compare  DRAM controllers Mike Wirthlin, BYU

41 Sandia MISSE-8 BYU Experiments on ISS –TMR PicoBlaze (Successful mitigation event!) –Smart signal detection –Reduced Precision Redundancy –BRAM Scrubbing & BRAM ECC Endeavor (STS-134) May 16, 2012 Photo courtesy of Sandia National Labs Photo courtesy of NASA V4 FX60 V5QV (SIRF) Under direction of Sandia National Laboratory Photo courtesy of NASA Mike Wirthlin, BYU

42 Radiation Testing Apply Ionizing Radiation to Design with TMR –Verify accuracy of artificial simulator –Identify upset in non-configuration state –Identify other failure modes FPGA Board Proton Beam UC Davis, Crocker Nuclear Laboratory  Medium-energy particle accelerator (76-inch cyclotron)  63 MeV proton source  Flux: 1e7 particles/cm 2 /second: (~1 upset/second)  16 hour test (~25,000 upsets) Mike Wirthlin, BYU

43 5. TMR Summary Pros: –Significant improvements in reliability –Easy to apply (limited design effort) –Can be applied selectively Cons –Requires significant hardware resources –Negative impact on timing –Difficult to verify Mike Wirthlin, BYU

44 Alternatives to TMR Exploit specific circuit structures/styles –Memories, state machines, processors, etc. –Arithmetic structures Detection+ –Detecting a fault quickly opens up many lower cost mitigation strategies Temporal Redundancy Duplication with Compare Mike Wirthlin, BYU

45 Future Plans Clock domain aware TMR Timing aware TMR Improved support for clock and I/O resources Integrated Duplication with Compare (DWC) More frequent voting NMR (5-MR, 7-MR, etc.) Support for New FPGA Architectures Improved verification (formal verification) GUI support Improved partial TMR selection (Algorithmic pTMR)

46 Questions? Mike Wirthlin, BYU


Download ppt "BL-TMR AND M ITIGATION A PPROACHES FOR FPGA S Mike Wirthlin BYU."

Similar presentations


Ads by Google