Presentation on theme: "Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica"— Presentation transcript:
1 SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Prof. Fernanda Lima Kastensmidt, Ph.D.Instituto de InformaticaUniversidade Federal do Rio Grande do SulPorto Alegre – RS – Brazil
2 MotivationA large set of electronics devices used in avionic, space and ground-level applications can be upset by ionized particles.General SystemFPGAmemoryprocessorsASICAnalog electronicshigh reliabilityXlow reliabilityHardenedcomponentsCOTScomponents$$$$$$$$$$$$$$$$$
3 Motivation Solution I: If it is too expensive, so the solution may be design your own hardened device!Which fault tolerance techniques should be used?How much fault tolerance is enough?It is necessary to qualify your hardened design.high reliabilityHardenedcomponents$$$$$$$$$$$$$$
4 Motivation Solution II: It is necessary to qualify the device to analyze its robustness to the application!Is it possible to apply some fault tolerance technique?Software levelComponent replication levellow reliabilityCOTScomponents$$$
5 Types of SEE Single event phenomena can be classified into three effects (in order of permanency):Single event upset and Single event transient (soft error)Single event latchup (soft or hard error)Single event burnout (hard failure)Hard errors or Single Event Latchup (SEL) are due to shorts between ground and power, and cause permanent functional damages.
6 Collected ChargeDepending on the circuit, transistor size, charge energy, different current amplitude, duration and shapes will appear.
8 Fault ToleranceFault Masking: any technique that prevents faults from introducing errors to the output (failure)+-FAILURE-++-ionization
9 Fault Tolerance - + + + - - FAILURE ERROR Fault latencyError latencyshieldingTransient current(injected or extracted from the junction)ERRORclkBIT-FLIPFAULT EFFECTTransient voltage pulse(capacitor node)FAULT+-FAILURE-++-ionizationFault Masking (hardening by design):Hardware and time redundancyHardened memory cellsError-correction codesSelf-checking mechanisms with recoverySensors(detection)9
10 Fault Tolerance - + + + - - FAILURE ERROR Redundant Spare components Fault latencyError latencyTransient current(injected or extracted from the junction)ERRORclkBIT-FLIPFAULT EFFECTTransient voltage pulse(capacitor node)FAULT+-FAILURE-++-Redundant Spare componentsionizationFault Masking (hardening by design):Hardware and time redundancyHardened memory cellsError-correction codesSelf-checking mechanisms with recoverySensors(detection)Number of faults overcome the mitigation technique
11 Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICsRadiation Effects on FPGAsRadiation Hardening by Design: Strategies for FPGAsFinal Remarks
12 Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICsRadiation Effects on FPGAsRadiation Hardening by Design: Strategies for FPGAsFinal Remarks
13 Single Event Effects (SEEs) Transient EffectSingle Event Upset (SEU): bit-flip in a sequential logic elementDigital Single Event Transient (DSET): transient voltage pulse in the combinational logic111111Combinational logicsequential logicsequential logic13
14 SEU in Sequential Logic WLWLOFFOFFOFF11OFFgndBIT-FLIPNNionizationPP14
15 Hardened MemoriesApproach 1: use decoupling resistors to slow the cell regenerative feedback response avoiding the bit-flip[Rocket, R., IEEE TNS, 1992]
16 Hardened MemoryApproach 2: add transistors to create an appropriate feedback devoted to restore the data corrupted.IBM Memory Cell [Rockett cell, 88]HIT Memory Cell (Velazco, 92]
17 Hardened MemoriesThe principle is to store the data in two different locations within the cell in such way that the corrupted part can be restored.Whitaker/Liu Memory Cell [Liu, 92]DICE Memory Cell [Calin, 96]
20 Dual Interlocked storage Cell (DICE) The original value is restoredclkclkOFFOFF11OFFOFFOFFQaQb
21 Challenges in Sequential Logic MULTIPLE BIT UPSETS+-+-+-Particle incidence angleTransistor DimensionsVoltage SupplyMemory Array Density+-Multiple memory cellsSingle memory cell
22 Charge Sharing (NMOS transistor) T=50psT=100psT=250psT=800psT=2ns[Reed, et al., New Electronic Technologies Insertion into Flight Programs Workshop, 2007]
23 Limitations of Hardened Memory Multiple nodes collecting charge are able to upset hardened memory cells.Solutions:Shallow Trench Isolation (STI) structuresSuitable transistors placement and routingHardened memory cells combined with hardware redundancy.+--+-++-+--++-ionization
24 Triple Modular Redundancy inputsMAJXSequential logic000 0001 0010 0011 1100 0101 1110 1111 1MAJCombinationallogicOKclkEach master-slave flip-flip can be composed of:standard latches: robust to multiple node collected charge in the same latchhardened latches: robust to multiple node collected charge in crossing domain latches too
25 Triple Modular Redundancy inputsMAJX 0X 1Sequential logic000 0001 0010 0011 1100 0101 1110 1111 1CombinationalMAJlogicXclkVoter’s output can show a transient wrong value that may be captured by the next memory cell.
26 Triple Modular Redundancy MAJSequential logicCombinationallogicMAJOKOKCurrent strengthMAJIncreases current drive helping keeping the node in the original value.OKclkTriple MAJ voter
27 Triple Modular Redundancy inputsMAJSequential logic000 0001 0010 0011 1100 0101 1110 1111 1XXCombinationallogicTripleMAJvoterclkCatastrophic effect: the system votes three wrong values out of three and the result is assumed to be correct.
28 SET in Combinational Logic Each node has an associated:CapacitanceResistanceSET pulseAmplitude x WidthCritical ChargeQCRITcurrentQDriftQdiffusionCharge Qi…time
29 SET in Combinational Logic Not all SETs are captured by a memory cell.They can be:Logical maskedElectrical maskedLatch window maskedLogical masked1e0e11Q11e2a3
30 SET in Combinational Logic Not all SETs are captured by a memory cell.They can be:Logical maskedElectrical maskedLatch window maskedElectrical masked1e0e1Negligible pulse1Q1e2a31
31 SET in Combinational Logic Not all SETs are captured by a memory cell.They can be:Logical maskedElectrical maskedLatch window maskedLatch window masked1e0e11Q1e2a31clk edge
32 Electrical Masking Heavy Ion Radiation Results: 180nm CMOS Pulse too narrow!!![Bruguier, G., et al., IEEE TNS, 1996]
33 SET vs. Frequency Radiation Results: DSET for 180nm vs. Freq Freq. clk [Benedetto et al, IEEE TNS, 2004]
34 Challenges in Combinational Logic SET Transient Width (TW) may vary from few hundred of pico seconds to few nano seconds according to LET.TWTW500 Mhzclk1Ghz2.5 GHzCritical Transient Width (ps)5GhzTWclk100 Ghz100Process technology (nm)[Dodd, P., IEEE TNS 2004]clkTW
36 Challenges in Combinational Logic According to the logic topology fan-out, a single SET may originate multiple SETs.a0y0Q0a1a2a3Xa4a5y1Q1X
37 Identifying the most sensitive nodes Fault injection performed by electrical (SPICE) and logic simulations can identify the most sensitive nodes:Lower critical charge (QCRIT)Lower SET logical mask probabilitymost sensitive nodesABCDEFZ
38 Transistor Resizing QCRITICAL [Zhou et al., IRPS 2004] [Cazeaux et al., IOLTS 2005][Dhillon et al., IEEE Transaction on ISVLSI 2006]most sensitive nodesABCDEFZQCRITICAL
39 Gate Replication Current strength [Lisboa, C., et al., SBCCI 2005] [Nieuwland et al., IOLTS 2006]most sensitive nodesABCDEFZCurrent strengthIncreases current drive helping keeping the node in the original value.
40 Temporal Filtering Votes the SET out by time redundancy. The time redundancy is implemented by delays at the clock lines or at the latch/flip-flops inputs.Sequential logicSequential logicCombinationalCombinationalclklogicTripleorSingleMAJvoterlogicTripleorSingleMAJvoterXTXclk+ TOKOK2.Tclk+ 2.Tclk[Nicolaidis, VTS 1999], [Anghel et al., DATE 2000]
41 Full time redundancy [Nicolaidis, VTS 1999] [Anghel et al., DATE 2000] clk[Nicolaidis, VTS 1999][Anghel et al., DATE 2000]clk+TTWclk+2.TSequential logicSETCombinationalcombclklogicTripleorSingleMAJvoterOKXclk+Tffp0ffp1clk+ 2.Tffp2The .T is directly proportional to theSET Transient Width (TW)MAJMAJ + comb delaysT
42 Full time redundancy TW clk period (T) X clk clk+2.T clk+4.T SET Sequential logiccombCombinationalclkTripleorSingleMAJvoterlogicOKXclk+2.Tffp0ffp1clk+4.Tffp2TW clk period (T)MAJMAJ + comb delaysT
43 Temporal Latching to Trigger SETs Error cross-section decreases with the increase of T.T[Benedetto et al., IEEE TNS 2004]
44 Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002]combinational logicShifted clocks
45 Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002]combinational logicXOKShifted clocks
46 Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002]Multiple nodes collected chargeXcombinational logicOKOKShifted clocksOK
47 Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002]Multiple nodes collected chargeOKcombinational logicOKOKShifted clocksOK
48 Full Triple Modular Redundancy (TMR) with self-recovery TRV0OKcombinational logicD0clk0TR1voterTR2E0TR1TRV1D1Xcombinational logicclk1TR0voterTR2E1TR2TRV2D2combinational logicclk2TR0voterTR1E2
49 Full Triple Modular Redundancy (TMR) with self-recovery TRV0D0OKcombinational logicclk0TR1voterTR2E0TR1TRV1D1Xcombinational logicclk1TR0voterTR2E1TR2TRV2D2combinational logicclk2TR0voterTR1E2
50 Full Triple Modular Redundancy (TMR) with self-recovery voterTR0TR1TR2TRV0TRV1TRV2E0E1E2D0D1D2clk0clk1clk2combinational logicoutput padsoutputpadwired voter
51 How much mitigation is enough? The circuits are becoming more and more complexHardware and Time redundancy techniques can provide a certain level of protection on:Single Event Upsets (SEU)Single Event Transient (SET)Multiple Bits or Nodes UpsetsProblem: in some cases multiple faults can overcome the mitigation techniques, provoking a system failure.
52 Multiple Faults in the Full TMR TR0TRV0D0Xcombinational logicclk0TR1voterTR2E0WRONG VALUETR1TRV1D1Xcombinational logicclk1TR0voterTR2E1TR2TRV2D2combinational logicclk2TR0voterTR1E2
53 How much mitigation is enough? How is it possible to know that the mitigation technique is working properly for a certain Soft Error Rate (SER)?It is necessary to have a mechanism to inform the system when the number of multiple faults have passed a certain level.Built-in Self Test (BIST) Mechanism:sensors working as watch dogseach time an ionization occurs, the system is informed
54 How about sensors working as watch dogs? Full TMR with sensorsvoterTR0TR1TR2TRV0TRV1TRV2D0D1D2clk0clk1clk2combinational logicsensors
55 How about sensors working as watch dogs? Full TMR with sensorsvoterTR0TR1TR2TRV0TRV1TRV2D0D1D2clk0clk1clk2combinational logicsensorsIf sensors detect:One upset per timeTechnique is working!
56 How about sensors working as watch dogs? Full TMR with sensorsvoterTR0TR1TR2TRV0TRV1TRV2D0D1D2clk0clk1clk2combinational logicsensorsIf sensors detect:Two or more upsets in distinct redundant modules per timeXTechnique is not working!
57 Bulk Built-in Current Sensors During normal operation, the current in the bulk is approximately zero.When an energetic particle generates an ionization, it creates a current that flows through the stroke node and Vdd or gnd.The bulk-BICS senses the current generated by ionization at the bulk terminal.[Henes Neto et al. IEEE MICRO, 2006]Bulk-BICS
58 Bulk Built-in Current Sensors VddVddFlips the BICS latchp6p1p21p5Vdd’1p4p3RSTPPNionizationBICS-PVddCircuitDesignn1n2nRSTGnd’n5n6n4n3BICS-N
59 Trade-offsThere is always some penalty to be paid when protecting circuits against upsets.Each technique may present a combination of:area overhead,performance penalty,power dissipation increase.The challenge is to select the most cost-effective techniques for the target circuit application.
60 CASE-STUDY: Adder Detection SET SEU = = X X Recomputing with Shifted Operands<<>>=S = A + B2.S = 2.A + 2.BADDER=Duplication with Comparison (DWC)ADDERBulk-BICS
61 CASE-STUDY: Adder SEU correction X X Error-Correction Code (Hamming) encdecADDERHardened Flip-flops
62 CASE-STUDY: Adder SEU and SET correction TMR with single voter TMR with triple voterADDERADDERvotervotervoterADDERADDERvoterADDERADDER
63 CASE-STUDY: Adder SEU and SET correction Time redundancy with TMR in the registersvotervotervoterTADDERvotervoter2.T
64 AREA vs. PERFORMANCE SEU and SET correction SEU correction More than 200%SEU and SET correctionSEU correctionLess than 50%Less than 50%SEU and SET detection
65 How about Qualifying for SEE? Testing by fault injection:Model the SEU and SET effect at:Spice levelLogic level or RTL levelTesting in a Laser FacilityTesting at ground-level facilities(in front of a beam of Protons,heavy ions, neutrons)Testing in space (actual environment)accuracycost
66 When testing in a Ground Level facility for SEE: Static Testing:no application is running during the test.The register files are read during or after the testto check for SEU or/and SET and compared to a gold file.Test in memories, microprocessors, ASICs in generalDynamic Testing:Applications are running during test.Outputs are been analyzed and compared to a gold design.SEU and SET can be checked during testTest in memories, microprocessors, ASICs in general, analogcircuits, etc…
67 General SystemFPGAmemoryprocessorsASICAnalog logic
68 Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICsRadiation Effects on FPGAsRadiation Hardening by Design: Strategies for FPGAsFinal Remarks
69 Field-Programmable Gate Arrays An array of logic blocks and interconnections customizable by programmable switches.High logic densityCustomizable by the end user to realize different designsConfigurable logic blocks(CLBs)interconnectionsSwitches for customization
70 Programmable Technologies Programmable switches can be based on:Antifuse: (Antifuses based FPGAs)when an electrically programmable switch forms a low resistance path between two metal layers.One-time configurableSRAM: (SRAM based FPGAs)the state of a static latch controls pass transistors or multiplexers connected to pre-defined metal layersRe-configurableFlash: (Flash based FPGAs)Floating gate controls the switches
71 Antifuse-based FPGAsNon-volatile: hold the customizable content even when not connected to the power supply.They can be programmed just once.FPGAs products for SpaceACTELAEROFLEX (based on Quicklogic)
72 ACTEL: RTAX-S device C R Super Cluster RAMCTSCRAMCRDHDCRRXTXBSuper Cluster[Actel, RTAX-S RadTolerant FPGAs 2007]
73 ACTEL: RTAX-S device C C R C-CELL R-CELL Susceptible to SET C-CELL CFNFCID1D3B0B1CFNRobust to SEU11X111111111ERRORD0D2DBA0A1YD0D2YDBA0A1FCO[Actel, RTAX-S RadTolerant FPGAs 2007]
74 Effects of Frequency Response Circuit: Shift Register with 8 levels of C-cell between R-cellsError cross-section increases when frequency increases.# ERRORclk edge[Berg, M. et al., IEEE TNS 2006]
75 RadHard Eclipse FPGA from Aeroflex ERRORXhardened flip-flopsRobust to SEUViaLink connections
76 Antifuse FPGAs: summary Customized routing is not sensitive to SEUFlip-flops are not sensitive to SEUActel and Aeroflex provides one solution where all flip-flops are hardened.Logic are susceptible to DSETsThe user may protect the logic by using high level mitigation techniques in the VHDL/VERILOG description of the design (TMR, duplication and others)
77 SRAM-based FPGAsVolatile: loose their contents information when the memories are not connected to the power supply.They can be reprogrammed as many times as necessary at the work siteThey are programmed by loading a bitstreamFPGAs products for SpaceXILINXATMELHONEYWELL
78 SRAM-based FPGAs Basic board must be composed of: FPGA The original design bitstream must be stored in a memory outside the FPGA.Memory size needed:Bitstream may range from Kbytes to several Mbytes.Power SupplyCore & IOOsc.EEPROMFPGALOADER& MEMORYFPGAProgramming InterfaceIO Interface
79 ReconfigurabilityCan offer benefits for space and remote applications by:saving space in the system: the same circuitry can be used with different configurations at different stages of a mission, reducing weight and power requirements.allowing in-orbit design changes reducing the mission cost by correcting errorsIf part of an FPGA fails, then circuitry can be reprogrammed to make use of remaining functional portions of the chips.
81 Technology Scaling in Xilinx FPGAs Nanometer technologiesEmbedded Hard microprocessorEmbedded memories (BRAM)
82 SRAM-based FPGA Architecture Xilinx FPGAConfigurable logic block (CLB)GRMslicesA B C DLookup Table (LUT)‘0’1Boolean FunctionF(A,B,C,D)BRAMThe configuration memory is spread throughout the device in a large array.Each logic block contains its own configuration cells locally.A data frame is a one bit vertical slice through the array.The configuration logic identifies each frame with a unique address “minor” aswell as a unique address for the column that it lies in.Configuration data is loaded serially or in byte-parallel into the interface whereit is assembled into a frame in a shift register.The entire frame is loaded into memory all at once.
83 SEU in SRAM-based FPGAs: CLB slice 1I1I2I3I4LUTCLB sliceTransient Effect (corrected at next ffp load)routingWhen we take a look in the FPGA architecture, there are two basic elements to implement the logic in a FPGA, the lookup table that is responsible to implement the logic functions as truth tables and the flip-flops. They are all located in the CLB slices. These basic elements are connected through a programmable routing that are multiplexers or pass transistors controlled by static memory cells.When a SEU occurs in one of these memory cells: the configuration memory bits (LUTs and routing) or in the CLB flip-flop, the stored value is flipped, this creates a fault with a permanent effect in the case of LUT or routing, or a fault with a transient effect in the case of the CLB flip-flop.LUTPersistent effect (corrected by scrubbing)Configuration memory bits
84 SET in SRAM-based FPGAs : CLB slice LUT1SET may be captured by the ffp.111X1111The transistors that implement the CLB Look-up Table (LUT) logic and routing are also susceptible tom transient faults known as SETs. This type of fault is transient and it only generates an error in the design if the SET pulse is captured by the CLB flip-flop.LUTroutingConfiguration memory bits
85 General Routing Matrix (GRM) Direct linesLong linesCLBCLBCLBCLBCLBCLBCLBCLBHex linesCLBHex connectionsCLBCLBCLBCLBCLBCLBCLBCLBCLBDirect connectionsFast connectCLBDouble linesCLBCLB
86 SEU in SRAM-based FPGAs: Routing Direct connections:Hex connections:1openopenshortshort1111shortopen
87 Other sensitive structures Power-on Reset (POR)Low probability of occurrenceSignature: done pin transitions low, I/O becomes tri-stated, no user functionality availableSolution: reconfigure deviceSingle-Event-Functional Interrupts(SEFI)SelectMAP and JTAG controllersLow probability of occurrenceSignature: loss of communication, read access to configuration memory returns constant value.Solution: reconfigure deviceDigital Clock Manager (DCM)Power-PC Hard IPInput and Output Blocks (IOB)Multi-Gigabit Transceivers (MGT)
88 SEE Characterization – Heavy Ion: Static Testing in Virtex4 So, it is common to evaluate the SEU sensitivity of the configuration memory bits, BRAM bits, CLB flip-flops and also the Power on Reset (POR) by static testing.But, what can we do it we want to draw the cross-section of SET in the FPGA logic circuitry that I presented?BRAMs present higher error cross-section compared to CLBsError cross-section of POR in Virtex4 has improved compared to Virtex-II.[George, et al. IEEE Radiation Effects Data Workshop, 2006]
89 Scrubbing ISE tool Synthesis optimizations Logic mapping Placement Hardware Description LanguageTMR by handISE toolSynthesis optimizationsLogic mappingPlacementRoutingISE toolPlacementRoutingconfiguration bitstream… …Scrubbing(full or partial reconfiguration)Fault Injection(fault tolerance verification)output
90 Scrubbing: continuous configuration It does not correct upsets in:Embedded Memory (BRAM)CLB flip-flopsNo application interruptionXQR18V04SRAM-based FPGADATA[7:0]DATA[7:0]BOOTOE/RESETINITConfiguration bitsCEDONEGNDCLKCSWRI/OPROMI/OOriginal bitstreamSCRUB ControllerXQR18V04DATA[7:0]SCRUBOE/RESETI/OCEI/OGNDCLKI/OOSCCCLK
91 Configuration Scrubbing Example: to correct persistent effect faults ColumnxConfigurationUpset
92 Configuration Scrubbing Example: to correct persistent effect faults UpsetRepairedScrubbing rate is important to reduce the probability of multiple upsets.Scrubbing can be performed:from outside the FPGA by another FPGA controllerfrom inside the FPGA: Hardware Internal Configuration Access Port (HWICAP)ScrubColumn
93 Mitigation Techniques Hardware Description LanguageTMR by handISE toolSynthesis optimizationsLogic mappingPlacementRoutingISE toolPlacementRoutingconfiguration bitstream… …Scrubbing(full or partial reconfiguration)Fault Injection(fault tolerance verification)output
94 X-TMR Full TMR in: Combinational logic Sequential Logic Inputs/Output padsWhy do we need full TMR?To guarantee the correct output in the presence of the persistent effect errors that are corrected only by loading the correct bitstream.FPGAREDUNDANTLOGIC (tr0)REDUNDANTLOGIC (tr0)REDUNDANTLOGIC (tr0)INPUTpackage PINREDUNDANTLOGIC (tr1)REDUNDANTLOGIC (tr1)TMR flip-flopREDUNDANTLOGIC (tr1)TMR flip-flopTMR Output VoterOUTPUTREDUNDANTLOGIC (tr2)REDUNDANTLOGIC (tr2)REDUNDANTLOGIC (tr2)package PIN
95 FPGAREDUNDANTLOGIC (tr0)REDUNDANTLOGIC (tr0)REDUNDANTLOGIC (tr0)INPUTpackage PINREDUNDANTLOGIC (tr1)REDUNDANTLOGIC (tr1)TMR flip-flopREDUNDANTLOGIC (tr1)TMR flip-flopTMR Output VoterOUTPUTREDUNDANTLOGIC (tr2)REDUNDANTLOGIC (tr2)REDUNDANTLOGIC (tr2)The recovery path is mandatory to correct the state of the flip-flops, specially in FSM.TMR flip-floptr0MAJR0 R1 R2MAJ1clk0tr1MAJclk1tr2MAJclk2LUT: _
96 0: it allows the data to pass to the output pad. FPGAREDUNDANTLOGIC (tr0)REDUNDANTLOGIC (tr0)REDUNDANTLOGIC (tr0)INPUTpackage PINREDUNDANTLOGIC (tr1)REDUNDANTLOGIC (tr1)TMR flip-flopREDUNDANTLOGIC (tr1)TMR flip-flopTMR Output VoterOUTPUTREDUNDANTLOGIC (tr2)REDUNDANTLOGIC (tr2)REDUNDANTLOGIC (tr2)package PINREFR0 R1 R2MAJ10: it allows the data to pass to the output pad.1: it blocks the dataR03-state_0O_voterR0R1O_voter3-state_1R1LUT: _R23-state_2O_voterR2
97 Evaluating TMR I/O pads [Swift et al, IEEE TNS 2004]Inputs at 66 MHz
98 Evaluating TMR I/O pads [Swift et al., IEEE TNS 2004]Heavy Ion
99 Evaluating Multiple Bit Upsets Heavy ion radiation static test:Virtex FamilyVirtex II Family220nm CMOS130nm CMOS[Quinn, et al., IEEE TNS, 2005]
100 Domain Crossing Events Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2).FPGAREDUNDANTLOGIC (tr0)aXOKINPUTpackage PINtr0with voters and refreshTMR registerMajority VoterTMR OutputREDUNDANTLOGIC (tr1)OUTPUTtr1REDUNDANTLOGIC (tr2)tr2package PINBit-flip a: affects only the redundant logic tr0, consequently, the majority voter choose the correct result (two out of three outputs).
101 Domain Crossing Events Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2).FPGAREDUNDANTLOGIC (tr0)OKXINPUTpackage PINtr0REDUNDANTLOGIC (tr1)with voters and refreshTMR registerMajority VoterTMR OutputbOUTPUTtr1REDUNDANTLOGIC (tr2)tr2package PINBit-flip b: affect two redundant logic parts, consequently, the majority voter will not choose the correct result (two out of three outputs).
102 Solution to Reduce Domain Crossing Events Voters Insertion:Barrier of voters can reduce the probability of a bit-flip in the routing causing a short cut connection among two or more redundant blocks.[Kastensmidt, et al., DATE 2005]FPGAREDUNDANTLOGIC (tr0)OKXINPUTpackage PINtr0tr0tr0REDUNDANTLOGIC (tr1)with voters and refreshTMR registerbMajority VoterTMR OutputOUTPUTTMR Majority Votertr1TMR Majority Votertr1tr1REDUNDANTLOGIC (tr2)tr2tr2tr2package PINlogic partition
103 TMR BRAM (Embedded memory) Upsets in BRAMs are not corrected by scrubbing.TMR with refreshing must be used to mitigate upsets.Need to use Dual Port BRAMs.Mechanism to refresh the memory contentsCounterVotersXOKOK
104 Verifying the Mitigated Design Hardware Description LanguageTMR by handISE toolSynthesis optimizationsLogic mappingPlacementRoutingISE toolPlacementRoutingconfiguration bitstream… …Scrubbing(full or partial reconfiguration)Fault Injection(fault tolerance verification)output checking
107 Summary Antifuse FPGAs: - Fault tolerance techniques applied in VHDL/Verilog- protect SET (SEU is protected by the vendor)SRAM FPGA- Scrubbing to clean persistent faults- protect SET and SEU- New FPGA protected by Vendor is coming out!Flash FPGA- protect SEU and SET- Flash transistor sensitivity for SEE is low, still underInvestigation
108 Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICsRadiation Effects on FPGAsRadiation Hardening by Design: Strategies for FPGAsFinal Remarks
109 Final RemarksMitigation techniques for ASICs and FPGAs must take into account SEUs and SETs considering single and multiple effects.ASICs: Integrated systems fabricated at nanometer technologies should have mitigation techniques at different levels to ensure robustness:charge dissipation (transistor resizing, capacitors, resistors)Sensors (bulk-BICS)hardware and time redundancyError-correction codes (ECCs)Self-checking and recomputation
110 Final RemarksFPGAs: new FPGA generations bring more flexibility and design capabilities but also more reliable design challenges.The design can always be protected by high level techniques (VHDL, VERILOG) such as TMR.In order to reduce the cost of TMR, solutions at the FPGA architectural level must be done in:CLB logic:Combinational blocksSequential blocksProgrammable switchesRouting programmable switches… to mitigate against SEU and SET!
111 ConferencesNSREC –IEEE Nuclear and Space Radiation Effects ConferenceRADECSEuropean Conference on Radiation Effects on Componentsand Systems2011- RADECS in Sevilla, SPAIN