Presentation is loading. Please wait.

Presentation is loading. Please wait.

P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs.

Similar presentations


Presentation on theme: "P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs."— Presentation transcript:

1 P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs

2 P173/MAPLD 2005 Swift2 Authors Gary Swift Jet Propulsion Laboratory/California Institute of Technology Gregory Allen Jet Propulsion Laboratory/California Institute of Technology Jeffrey George The Aerospace Corporation

3 P173/MAPLD 2005 Swift3 Authors Sana Rezgui Xilinx Corporation Carl Carmichael Xilinx Corporation Fayez Chayab MDRobotics

4 P173/MAPLD 2005 Swift4 Abstract We show recent results for the upset susceptibility of the registers and caches in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively, these upsets can dominate the system error rate. We consider several techniques for implementing various levels of redundancy to reduce system errors, including single-, dual- and triple-chip options. We conclude that the dual-chip option may often be the best choice and warrants further study.

5 P173/MAPLD 2005 Swift5 Background - Reconfigurable FPGA Upsets The basic building blocks are soft to upset [Ref. 1]

6 P173/MAPLD 2005 Swift6 Background - Upset Mitigation Critical applications require design-level upset mitigation Design Triplication –The use of TMR (or triple modular redundancy) in a design allows correct function through triplicated majority voters even when a configuration element is upset. –The extra design effort is now largely automated by new software (TMRtool). Active Configuration Scrubbing –Upsets in the configuration must not be allowed to accumulate or TMR will “break” –Scrubbing uses some resources, but can be implemented so that it is transparent to system operation.

7 P173/MAPLD 2005 Swift7 Embedded “Hard-Core” Processor(s) Upset PowerPC 405 cores in Virtex II-Pro family FPGAs offer unprecedented computational power inside an FPGA, but include additional upsetable storage elements

8 P173/MAPLD 2005 Swift8 Processor Upsets – Data Cache Processor caches are very important features for increased performance; however, upsets in the caches can lead to system errors.

9 P173/MAPLD 2005 Swift9 Processor Upset Mitigation The “obvious” solution of implementing TMR with three processor cores is not an available single chip option because the maximum number of processors per FPGA is currently two. Tradeoffs between upset robustness and system complexity, possibly spanning multiple FPGAs, must be considered.

10 P173/MAPLD 2005 Swift10 One-Chip Solution Running two processors in lockstep is conceptually simple, esp. as they can reside in a single FPGA. A fast TMR-ed comparison block is required to contain errors and not allow them to propagate into the rest of the system. A processor upset will appear to the comparison block as a disagreement, necessitating both processors be stopped within the current clock cycle. Then they both must be forced to roll back to a known good software “bookmark” or, alternatively, to reboot.

11 P173/MAPLD 2005 Swift11 Flow Chart One-Chip Solution

12 P173/MAPLD 2005 Swift12 Advantages Contained in one chip –No chip-to-chip interconnects (minimal latency and propagation delay) –Lower power consumption –Less board area –No chip-to-chip synchronization Technology is more developed and tested [See Reference 2]

13 P173/MAPLD 2005 Swift13 Disadvantages More system outages –Reboot or rollback on every error –Not suitable for some critical real-time applications Twice as many errors as on a single processor, but at least they are detected Note: Requires extra device – either watchdog timer or external configuration scrubber

14 P173/MAPLD 2005 Swift14 Two-Chip Solution With four processors in lockstep (necessitating two chips), a solution as robust as full TMR is possible. In this scheme, a pair of processors that get into a disagreement due to an upset will be stopped while the system runs without interruption on the processor pair that are in agreement. Correct internal state information is available in the working pair., preferably soon. Thus, it is possible to re- synchronize almost transparently and rapidly get back to full four- processor lockstep operation with minimal intrusion. As a side effect of using two separate FPGAs, additional robustness is possible by adding on cross-strapped configuration control.

15 P173/MAPLD 2005 Swift15 Flow Chart Two-Chip Solution

16 P173/MAPLD 2005 Swift16 Advantages Reboots rare; requires simultaneous errors in two separate processors Processor upsets are transparently handled without system outage until convenient re-synchronization opportunites Enhanced robustness – outages lowered to less than the SEFI rate of ~1 in 80 years per device Allows added configuration robustness –Chips check each other (not self-checking) –Eliminates need for external watchdog timer

17 P173/MAPLD 2005 Swift17 Disadvantages Complicated –Inter-chip communication/synchronization –Transparent reboot/resynchronization of both processors in chip with error Twice the power consumption In-beam testing is not yet done (although planned for the near future)

18 P173/MAPLD 2005 Swift18 Three-Chip Solution The three-chip implementation (also known as the “virtual FPGA” solution [Ref. 3]) takes the responsibility of error detection out of the hands of the upsetable FPGAs by adding a Radiation- Hardened ASIC. Note that only one processor per FPGA is needed. The ASIC handles stopping error propagation and re-synchronizing an upset processor. Additionally, the ASIC can be used for configuration control of all three FPGAs.

19 P173/MAPLD 2005 Swift19 Flow Chart Three-Chip Solution

20 P173/MAPLD 2005 Swift20 Advantages Maximum robustness to upsets Only three processors in lockstep (but in 3 chips) More fabric available for other functions No system outages; errors and SEFIs are handled transparently Most implementation details are confined to the ASIC and don’t affect the IP in the FPGAs significantly

21 P173/MAPLD 2005 Swift21 Disadvantages Complex ASIC development for controller to vote outputs and re-load/re-sync upset processor ASIC development cost (currently funded though) Board area

22 P173/MAPLD 2005 Swift22 Conclusions Both two-chip and three-chip solutions have about the same robustness, power consumption, and system complication, but handle upsets better than the one- chip solution. The two- vs. three-chip decision mostly boils down to the familiar FPGA vs. ASIC debate Three-chip solution may use less power than the two- chip. (Is the ASIC’s power consumption less than that of one processor core?) At present, the JPL-preferred approach is the two- chip implementation achieving maximum flexibility and near maximum robustness to upsets.

23 P173/MAPLD 2005 Swift23 References [1] J. George et al., “Initial Single-Event Effects Testing and Mitigation in the Xilinx Virtex II-Pro FPGA,” Paper 211, MAPLD 2005. [2] M. Wang and G. Bolotin, “SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA,” Paper D110, MAPLD 2004, http://klabs.org/mapld04/presentations/session_d/ 1_d110_wang_s.ppt http://klabs.org/mapld04/presentations/session_d/ 1_d110_wang_s.ppt [3] J. Lyke and B. Marty, Virtual Field Programmable Gate Array Triple Modular Redundant Cell Design, Air Force Research Laboratory: Space Vehicles Directorate, AFRL-VS-PS-TR-2004-1093, April 28, 2004.


Download ppt "P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs."

Similar presentations


Ads by Google