Presentation on theme: "For Official Use Only Napier 131/MAPLD 20041 MAPLD 2004 SINGLE EVENT EFFECT (SEE) ANALYSIS, TEST, MITIGATION & IMPLIMENTATION OF THE XILINX VIRTEX-II INPUT."— Presentation transcript:
For Official Use Only Napier 131/MAPLD 20041 MAPLD 2004 SINGLE EVENT EFFECT (SEE) ANALYSIS, TEST, MITIGATION & IMPLIMENTATION OF THE XILINX VIRTEX-II INPUT OUTPUT BLOCK (IOB) Mathew Napier(1), Jason Moore(2), Kurt Lanes(1), Sana Rezgui(2), Gary Swift(3) (1)Sandia National Laboratories, Albuquerque NM, USA (2)Xilinx, San Jose, CA, USA (3)JPL/Caltech, Pasadena, CA, USA "This work was carried out in part by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration." "Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology."
For Official Use Only Napier 131/MAPLD 20042 Purpose & Outline Analyze and Evaluate the different types of TMR IOB Mitigation structures. Discuss the trade offs: SEE, electrical/timing and resources, and how these trades off effect the operation and MTBF of a system. OUTLINE –IOB –SEE IOB Mitigation Triple Module Redundant IOB JPL Dual-MR –SEE Trade offs Cross Section Signal Integrity and Timing –System Implementation TMR, EDAC, I/O Count High-speed Interfaces
For Official Use Only Napier 131/MAPLD 20043 SEU Hazards for Xilinx Technology Configuration Memory –Configuration memory controls logic function and routing –Configuration Memory Upsets Cause Changes logic function Changes routing Changes IO Configuration Transient and Static Bit Errors –Changes data and control states Single Event Functional Interrupt (SEFI) –Power On State Machine Upsets (POR Upset) Causes power on reset to occur –Select Map and JTAG Disables part configuration/scrub Effective mitigation techniques exist for each of these error modes SRAM Configuration Memory Controls Logic Function Look-up Tables Internal Registers Store State Data SRAM Configuration Memory Controls Routing Switch Matrix
For Official Use Only Napier 131/MAPLD 20044 Reg DDR mux 3-State OCK1 OCK2 Reg DDR mux Output OCK1 OCK2 PAD Reg Input ICK1 ICK2 IOB IOB are used to interconnect the Xilinx FPGA fabric with external devices. Support a wide range of I/O operating standards. –Differential – LVDS… ECL –Single Ended – LVCMOS…HSTL Silicon features greatly increasing system performance. –Flip Flops in the IOB –Double Data Rate Flip Flops –Digital Impedance control An IOB consists of the following parts –Input path Two DDR registers –Output path Two DDR registers Two 3-state DDR registers –Separate clocks for I & O –Set and reset signals are shared Separated sync/async Separated Set/Reset attribute per register Input Output Buffer (IOB)
For Official Use Only Napier 131/MAPLD 20045 IOB Details IOB Detailed View (FPGA Editor) Output Registers 3-State Control Registers Input Registers IO standard options (LVDS, etc)
For Official Use Only Napier 131/MAPLD 20046 Xilinx Triple Module Redundancy (XTMR): Inputs SEU Immunity requires the use of triple redundant input pins for every input signal. Not triplicating input Global signals (clk, rst, etc) can seriously compromise SEU resistance. Triplication of input data paths can be traded for EDAC. –Reduce I/O count SEU resistance is sometimes traded-off for resource utilization. Xilinx input Capacitance is 10pF per I/O so user needs to verify that interfacing parts can drive 30pF at speed.
For Official Use Only Napier 131/MAPLD 20047 XTMR : Triplicated Outputs with Minority Voters Outputs can be triplicated, using three pins for each output signal. Minority voters monitor each of the triplicated design modules If one module is different from the others, its output pin is driven to High-Z Voters are triplicated Minority Voter P TR0 TR1 TR2 Minority Voter P P Convergence point is outside FPGA, at trace
For Official Use Only Napier 131/MAPLD 20048 XTMR: Triplicated Output Operation - Datapath SEU If a datapath SEU occurs, minority voter places its pin in high-Z Remaining valid outputs drive output to correct value. If an SEU occurs on the Minority voter, the worst it can do is disable a valid output. –To pass an incorrect output, two upsets would have to occur on the same path –Active Scrubbing of the part will eliminate the accumulation of double SEUs in Configuration Logic Minority Voter P TR0 TR1 TR2 Minority Voter P P Z P TR0 TR1 TR2 Minority Voter P P Z
For Official Use Only Napier 131/MAPLD 20049 XTMR : Duplicated Outputs with Minority Voters (JPL) In this scheme (by Gary Swift at JPL), triplicated design domains are driven on to two pins Two minority voters monitor each of the triplicated design modules If a module is different from the others, its output pin is driven to High-Z Voters are duplicated If an SEU occurs on the datapath without a pin, the outputs continue operating as normal. Minority Voter P TR0 TR1 TR2 Minority Voter P Convergence point is outside FPGA, at trace Minority Voter P TR0 TR1 TR2 Minority Voter P
For Official Use Only Napier 131/MAPLD 200410 XTMR: Duplicated Output Operation - Datapath SEU(2) If an SEU occurs on the datapath with a pin, that pin is driven to high-Z. The main advantage of this technique is that it uses 2 rather than 3 pins thus reducing pin count and maintaining SEU immunity. If an SEU occurs on the Minority voter, the worst it can do is disable a valid output. Same as XTMR Minority Voter P TR0 TR1 TR2 Minority Voter P Z P TR0 TR1 TR2 Minorit y Voter P Z
For Official Use Only Napier 131/MAPLD 200411 XTMR: Single output pin If a design is pin-limited, you can elect not to triplicate some outputs. A single Majority Voter can be placed in series with a single output. This will cause additional output delay and leave the output path susceptible to SEU TR0 TR1 Majority Voter TR2 OBUF
For Official Use Only Napier 131/MAPLD 200412 XTMR Output Analysis How many configuration bits in TMR I/O after Minority Voter? –Errors in these bits will change the IOB function and NOT be caught by the voter. –How many one bit upsets will really change the Function? Does a Stuck at High, Stuck at Low or Inverted IOB Failure in a XTMR structure still function correctly? Can two I/O overdrive the failed one? –Voltage output High –Voltage output Low –Timing Rise/Fall –How does this change for different I/O types and switching speeds. How to design a system that balances –SEE sensitivity –System performance and speed –Resource Utilization
For Official Use Only Napier 131/MAPLD 200413 Schematic Analysis Determine the number of Configuration Memory Cells (CMC) needed to configure unprotect and TMR I/O Configuration by analyzing Xilinx schematics. Guidelines/Assumptions –Not all SEUs will be catastrophic – therefore there are two types of SEUs (Hard and Soft Failures) Hard Failure : 100% certainty that when it occurs – will cause a system failure –Causing the output to become inverted –Causing the output to be either stuck high/low –Changing the signaling standard to something completely different (e.g. LVCMOS to HSTL) –Causing the output to be tri-stated Soft Failure: Uncertain as to the effect –Changing the signaling standard to something similar (LVCMOS to LVTTL) –Changing the drive strength or slew rate –Changing the termination
For Official Use Only Napier 131/MAPLD 200414 Schematic Analysis Results Schematic Analysis of this path = 109 bits (but only 92 “essential) –26 Hard Failures –66 Soft Failures CLB LUTRouting to IOBIOB
For Official Use Only Napier 131/MAPLD 200415 TMR Output Results Schematic Analysis of this configuration = 173 bits –27 Hard Failures –122 Soft Failures TMR has larger cross section then unprotected. AC analysis will determine which type is more robust. CLB and RoutingIOB
For Official Use Only Napier 131/MAPLD 200416 SEE Mitigated IOB Signal Integrity and Timing MEMEC Insight MB-2000 board used as test platform to test Electrical and Timing Characteristics of XTMR. –Tied Three I/O together and ran through four different cases: Normal, Stuck at High, Stuck at Low, Inverted –For Each Case the following measurements were measured. Voh, Vol, Tr, Tf 4GHz Scope Pictures –I/O Types Evaluated included 1.8V/2.5V/3.3V LVCMOS & LVTTL, LVDCI (Impedance control) & LVDS. Fast and Slow Slew Rate. Hyperlinx Simulations were preformed on all of the above cases to verify correlation between measured and simulated data. JPLs dual-redundant minority voters mitigation scheme will fail all of the above operating conditions if one of the I/Os fail.
For Official Use Only Napier 131/MAPLD 200417 SEE Mitigated IOB Signal Integrity and Timing Normal Inverted XTMR 1.8V LVCMOS One output Inverted Voh downto 1.4V down from 1.8V Vol upto.4V up from 0V Noise do to lack of termination
For Official Use Only Napier 131/MAPLD 200418 SEE Mitigated IOB Signal Integrity and Timing Stuck at Low Simulation Hyperlynx IBIS Model LVCMOS1.8V –Measured Voh = 1.44V Vol = -.04V Tr =.62ns Tf =.52ns Simulated Voh = 1.26V Vol = -.06V Tr =.60ns Tf =.70ns Stuck at High Simulation LVCMOS1.8V –Measured Voh = 1.72V Vol =.4V Tr =.58ns Tf =.51ns Simulated Voh = 1.79V Vol =.54V Tr =.80ns Tf =.60ns Stuck at High Stuck at Low Simulation data correlates with measured data
For Official Use Only Napier 131/MAPLD 200419 SEE Mitigated IOB Signal Integrity and Timing Measured Data Spread Sheet NormalStuck At Low INV SAH Failure limits V output low margin or violates level
For Official Use Only Napier 131/MAPLD 200420 CMC Failure Comparison Inverted TMR displayed zero failures at 3.3V and 1.8V Naked I/O has much larger CMC failure cross section then TMR setup. I/O test design is only running at 30MHz. TMR failures may show up at higher speeds. How does Naked I/O compare to TMR in dynamic test in the beam and Fault Injection? Test will show CMC sensitivity do to switching failures large enough to break output switching state.
For Official Use Only Napier 131/MAPLD 200421 System Goals & Implimentation GOALS –Xilinx FPGA technology is a Mission Enabling Technology –SEU Goal – Develop a design that produces the SEU performance comparable to that of a fully hardened design while exploiting the capabilities of state-of-the-art CMOS process technologies –SEU Result – System Upset rate is superior to that which could be achieved with unmitigated SEU hard logic IMPLIMENTATION –Command and control logic is implemented in SEU hard logic Processor Memory includes Parity protection Fail over to boot code –SEU detection and recovery for SEU soft devices is automatic and occurs without ground intervention –SEU induced outages that do not require ground intervention are booked against mission availability –Although not a specific requirement good SEU performance under nominal solar flare conditions is desired
For Official Use Only Napier 131/MAPLD 200422 SEU Mitigation and Error Control Mitigate IO Upsets –TMR of IO for clocks and address signals –EDAC for data path signals Mitigate Configuration Memory Upsets –TMR internal logic –Configuration memory scrubbing to prevent error accumulation Design approach does not include POR upset mitigation –Use of shadow devices effective against POR errors –POR Error rate is very low The flight system makes extensive use of several techniques to exploit the advantages of nano-meter CMOS technology while maintaining excellent SEU performance –Multiple bit Reed-Solomon forward error correction codes –Single bit error correcting codes –Simple parity error detection –Cyclic-Redundancy-Check for burst error correction –Triple Modular Redundancy –Error Scrubbing Mitigation technique is selected based upon error rate, vulnerability, system impact, and implementation complexity Mitigation techniques provide coverage for dynamic SEU errors Error Correction Techniques Implemented for SEU Mitigation Improve the Overall Design Robustness and Reliability
For Official Use Only Napier 131/MAPLD 200423 Mitigation Overview – Sensor Data Processor (SDP) –Processes 8Gbps of Data. –Outputs 340Mbits of Processed Data. –Architecture Fiber Receiver and SERDES link, 4 channels at a maximum of 160Mpix ea. Four Quadrant Processors for data processing. Contains 640 Mbytes of SDRAM for data storage –320 bit 85Mhz SDRAM 1.8V –Can generate upto 340Mbits/s of Source Packet Data One Central Virtex For Data Networking –De-mux data from Serdes chips outputs to 4 processing channels/Quadrant Xilinx –Controls Frame Summation Rates and Reference Frame Generation Rates. –Transfer Source Packets to downlink modules at up to 340Mbits/s Max –USES Compresses source Packets.
For Official Use Only Napier 131/MAPLD 200424 Mitigation Overview – Sensor Data Processor (SDP) XC2V3000 SERDES XC2V3000 640MB +ECC Gilgamesh A-I2C CTM Voltage Temp. To DLM / DLC Interface Control XC2V3000 640MB +ECC XC2V3000 640MB +ECC XC2V3000 640MB +ECC Fiber Input JTAG I2C TIME System CLK Osc SDP I2C 320 I2C PIX/Packet Packets PXS CTM ECC RS-ECC TMR ECC/CRC ECC/TMR TMR
For Official Use Only Napier 131/MAPLD 200425 SDP- SDRAM SDRAM interface, 1 per Quadrant Virtex –20 1.8V Micron Mobile SDRAM –1.8V LVTTL I/O –320 Bit Data Bus – 240 Pixel DATA, 80 ECC –Data is Reed Solomon Encoded –TMR'd outputs from Virtex: address,control and Clock –Address and control signals are AC Terminated. –TMR’d input to Virtex: Clock Feedback – Used to de-skew the SDRAM Clock –Currently running at 85MHz designed to operate at 100MHz Test –Measured TMR SDRAM Addr, RAS and CAS signals for the following cases. Inverted, Stuck High, Stuck Low Measured Voh, Vol, Tr and Tf. Count the Number of Reed Solomon Errors, If any. SDRAM ADDRESS & CONTROL
For Official Use Only Napier 131/MAPLD 200426 SDP- SDRAM(2) SDRAM Address Normal SDRAM Address One I/O Inverted
For Official Use Only Napier 131/MAPLD 200427 SDP- SDRAM(3) No SDRAM Errors for All Three Failure Cases
For Official Use Only Napier 131/MAPLD 200428 Upset Rates for Various SEU Mitigated IO Configurations
For Official Use Only Napier 131/MAPLD 200429 Lessons Learned Triple redundant outputs for >2.5V LVCMOS or LVTLL achieve correct Vol and Voh levels for all failure cases For low voltage I/O <1.8V Thresholds are very close to margins for failure conditions and may violate other parts spec. For SDRAM interface 1.8V I/O tolerated all three failure cases at room temperature. Double redundant outputs will not meet the correct Vol and Voh levels under I/O failure. Rise and/or Fall times are lengthened do to I/O failure. May cause more failures at higher speeds. Recommendation If resources permit XTMR output for all control signals is recommended regardless of I/O type. High Speed, Jitter or Duty Cycle Sensitive Devices Outputs need special consideration EDAC on Data busses are ideal for IOB failure protection.