Partially Reconfigurable System-on-Chips for Adaptive Fault Tolerance Shaon Yousuf Adam Jacobs Ph.D. Students NSF CHREC Center, University of Florida Dr.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory Allen 3, SEU Mitigation of a Soft Embedded Processor in.
2009 Midyear Workshop F4-09: Virtual Architecture and Design Automation for Partial Reconfiguration All Hands Meeting November 10th, 2009 Dr. Ann Gordon-Ross.
Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of.
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.
Reconfigurable Computing (EEL4930/5934) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Joseph Antoon Research Students.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.
Complex Upset Mitigation Applied to a Re-Configurable Embedded Processor EEL 6935 Lu Hao Wenqian Wu.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Ultrasonic signal processing platform for nondestructive evaluation (NDE) Raymond Smith Advisors: Drs. In Soo Ahn, Yufeng Lu May 6, 2014.
FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia
1/28 ECE th May 2014 H ardware Implementation of Self-checking circuits on FPGA Project Team #1 Chandru Loganathan Sakshi Gupta Vignesh Chandrasekaran.
Customizing Virtual Networks with Partial FPGA Reconfiguration
1 Performed by: Lin Ilia Khinich Fanny Instructor: Fiksman Eugene המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.
Configurable System-on-Chip: Xilinx EDK
Virtual Architecture For Partially Reconfigurable Embedded Systems (VAPRES) Architecture for creating partially reconfigurable embedded systems Module.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Radiation Effects and Mitigation Strategies for modern FPGAs 10 th annual workshop for LHC and Future experiments Los Alamos National Laboratory, USA.
Design and Characterization of TMD-MPI Ethernet Bridge Kevin Lam Professor Paul Chow.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
Embedded Systems Seminar (EEL6935, Spring 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University.
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
POLITECNICO DI MILANO Reconfiguration 4 Reliability design methodology for reliability assessment and enhancement of FPGA-based systems Dynamic Reconfigurability.
DAPR: Design Automation for Partially Reconfigurable FPGAs Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Associate.
J. Christiansen, CERN - EP/MIC
PetrickMAPLD05/BOFL1461 Virtex-II Pro PowerPC SEE Characterization Test Methods and Results Session L: Birds of a Feather David Petrick 1, Wesley Powell.
MAPLD 2009 August 31 - September 3, 2009 SPFFI: Simple, Portable FPGA Fault Injector Grzegorz Cieslewski Ph.D. Student NSF CHREC Center, University of.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
Exploiting Partially Reconfigurable FPGAs for Situation-Based Reconfiguration in Wireless Sensor Networks Rafael Garcia, Dr. Ann Gordon-Ross, Dr. Alan.
Experimental Evaluation of System-Level Supervisory Approach for SEFIs Mitigation Mrs. Shazia Maqbool and Dr. Craig I Underwood Maqbool 1 MAPLD 2005/P181.
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Wang-110 D/MAPLD SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
This material exempt per Department of Commerce license exception TSU Xilinx On-Chip Debug.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
Aerospace Conference ‘12 A Framework to Analyze, Compare, and Optimize High-Performance, On-Board Processing Systems Nicholas Wulf Alan D. George Ann Gordon-Ross.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
System on a Programmable Chip (System on a Reprogrammable Chip)
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
An Automated Hardware/Software Co-Design
School of Engineering University of Guelph
CFTP ( Configurable Fault Tolerant Processor )
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
MAPLD 2005 BOF-L Mitigation Methods for
Abelardo Jara-Berrocal Joseph Antoon Ph.D. Students
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida
University of Florida, Gainesville, Florida, USA
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Introduction to Partial Reconfiguration
Presentation transcript:

Partially Reconfigurable System-on-Chips for Adaptive Fault Tolerance Shaon Yousuf Adam Jacobs Ph.D. Students NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Assistant Professor of ECE NSF CHREC Center, University of Florida

2 Introduction Many space systems use remote sensing applications  Gathers information about a target of interest from a distance Gathered information requires processing  Send data to ground station or other space systems using communication link Modern remote sensing applications are complex  Gathers a large amount of data  Impractical to send all data through communication link System performance bottlenecked by limited communication bandwidth  Solution: Pre-process data and transmit results On-board processing using system-on-chips (SoCs) Preprocess Data Limited Bandwidth

SoCs increase on-board data processing capabilities  However, increases the system’s payload  Optimized/customized SoCs for use in space (space SoCs) required Provide cost effective, high performance, and reliable data processing  Traditionally space SoCs consist of radiation hardened (rad-hard) devices 3 Specialized device enable reliable on-board data processing Fixed/static design provide all the application’s required functionality all of the time SoCs for Space Applications Specialized equals expensive Increased payload Rad-hard devices

4 SoCs for Space Applications Is there a better choice?  Sure, why not use commercial-off-the-shelf (COTS) SRAM-based FPGAs Cheaper than rad-hard devices Allows reprogrammability (time multiplex hardware resources to reduce payload) Is it that simple?  Well, no In space, cosmic radiation corrupts FPGA SRAM! These are called single event upsets (SEU)s FPGA FPGA Fault tolerance (FT) techniques used for reliability (provide redundant copies of required functionality) Efficient SoC design to ensure a particular functionality along with required FT is available when required Payload still an issue Increased design complexity COTS FPGA devices

5 SoCs for Space Applications So what do we do?  Mitigate payload issues by adapting to varying levels of radiation in space Same degree of FT (reliability) not required all the time Reconfigure FPGA to provide adaptive fault tolerance (AFT)  Mitigate design complexity by designing a AFT base platform Enable rapid design and deployment of space applications Low radiation orbit High radiation Orbit High reliability required Low reliability will suffice

6 AFT using FPGA Reconfiguration FPGAs offer two reconfiguration (reprogrammability) methods  Full reconfiguration (FR), which halts and reconfigures the entire FPGA  Can impose significant performance overhead  Partial reconfiguration (PR) halts and reconfigures a portion of the FPGA  Mitigates FR performance issues by isolating reconfiguration to selected parts PRR – Partially reconfigurable regions Central Controlling Agent ICAP Mem controller Module A Module B Module C Static modules Reconfigurable Modules (PRMs) PRR 1 PRR 2 Static region Static modules Module: A & B Modules: C & D Module D FPGA Fabric Example with 2 PRRs

7 Contribution * A. Jara-Berrocal, A. Gordon-Ross, "VAPRES: A Virtual Architecture for Partially Reconfigurable Embedded Systems," Design, Automation & Test in Europe Conference & Exhibition (DATE), March 2010 In this work, we present an adaptive fault tolerant partially reconfigurable system-on-chip (AFT PR SoC)  Leverages VAPRES*  A Virtual Architecture for Partially Reconfigurable Embedded Systems  Contains a data flow controller to manage data flow to and from PRRs  Enables high SoC throughput by continuous data stream processing  Contains a software-based AFT controller to vary the degree of FT  Dynamically reconfigures the PRRs and changes the reliability mode according to the current orbital position The AFT PR SoC decrease payload and cost of space systems as compared to traditional static FT systems The AFT PR SoC can be leveraged as a base platform to deploy a multitude of different space applications

MicroBlaze CPU PR Region 1 PR Region 2 IO Module To IO PLB Bus (other peripherals: SDRAM, UART) PR Socket GPIO Peripheral PR Socket ICAP Why VAPRES ? FSL Fast Simplex Links Switch 1 Switch 2 IF Slice macro Regional clock buffer (BUFR) MicroBlaze CPU PR Region 1 PR Region 2 PLB Bus (other peripherals: SDRAM, UART) GPIO Peripheral PR Socket FSL Fast Simplex Links IO Module To IO Switch 1 Switch 2 IF ICAP Independent clocks Control functions Reconfiguration Data Streaming data channels 8 VAPRES is a multipurpose, scalable, flexible architecture  Flexible, scalable PRR count PRR size Number of FSLs per PRR/IOM MACS bandwidth  Good platform for developing complex reconfigurable applications

9 AFT PR SoC Design Consists of Two Steps Data flow controller step  Creates an HDL-based finite state machine to orchestrate the dataflow between the MicroBlaze and PRRs Software-based AFT controller step  Creates a C-based AFT controller module that allows the MicroBlaze to adaptively change the reliability mode

10 Data Flow Controller Idle Read_Data Read_Write_ Data Write_Data Stall If p_consumerfsl_rdy/ ce = 1, start = 1 If p_consumerfsl and rfd and done/ ce=1, start=1 If !p_consumerfsl_rdy If p_consumerfsl and rfd and !done/ ce=1, start=1, p_consumer_en =1, p_consumer_data (32) = input_data (32) If !p_producer_rdy and !rfd/ p_consumer_en=0 If dv and p_producer_rdy/ p_producerfsl_en = 1 p_producerfsl_data(32) = output_data(32) If !p_producer_rdy/ ce= 0, start=0 If !p_producer_rdy / ce= 0, start=0 If !p_producer_rdy / ce= 0, start=0 If p_producer_rdy/ ce= 1, start=1 If !data_valid/ ce = 0, start = 0 If p_consumerfsl and rfd and dv and p_producer_rdy/ p_consumer_en =1, p_consumer_data (32) = input_data (32), p_producerfsl_en = 1, p_producerfsl_data(32) = output_data(32)

11 AFT controller brings efficient resource management to traditional fault tolerant (FT) systems  Required FT level varies to match current orbital position’s radiation level  Offers four reliability modes (software-based switching) Reliability mode switching depends on thresholds  Required FT level dictates hardware task (PRMs) loading/unloading into PRRs Unused PRRs turned off to save power (power saving mode)  Software voter detects anomalies and refreshes PRRs (configuration scrubbing) when errors detected (refresh mode) MicroBlaze CPU PLB Bus (other peripherals: SDRAM, UART) GPIO Peripheral PR Socket ICAP Voter+Controller FSL Fast Simplex Links PR Region 1 PR Region 2 PR Region 3 PR Socket Data PR Region 4 PR Socket FFT Matrix Multiply Software-based AFT Controller TMR – Triple modular redundancy SCP – Self-checking pairs ABFT – Algorithm-based fault tolerance TMR – Triple modular redundancy SCP – Self-checking pairs ABFT – Algorithm-based fault tolerance Reliability modes  High reliability – TMR  Medium reliability – SCP  Low reliability – PRM loaded into single PRR  Hybrid reliability Use low reliability mode for PRMs with ABFT Use medium/high reliability for PRMs without ABFT Matrix Multiply CORDIC PRM – Partially reconfigurable modules

12 Experimental Setup Software  Xilinx ISE design suite 12.4  AFT VAPRES SoC compared to SoC without AFT Both SoCs have 4 PRRs PRRs reconfigured with 1k-point FFTs PRRs span 40 vertical and 21 horizontal configuration logic blocks (1,680 slices each)  SoC without AFT always operates in TMR mode (worst-case condition)  AFT SoC switches according to thresholds Low SEU rate threshold of 2.0 SEUs per day for switching between low to medium reliability High SEU rate threshold of 8.0 SEUs per day for switching between medium to high reliability  Virtex-5 LX110T ISS orbit fault rates applied Hardware  XUPV5-LX110T board * ** Quinn, H.; Morgan, K.; Graham, P.; Krone, J.; Caffrey, M.;, "Static Proton and Heavy Ion Testing of the Xilinx Virtex-5 Device," Radiation Effects Data Workshop, 2007 IEEE, vol.0, no., pp , July 2007 doi: /REDW URL: Virtex-5 LX110T ISS orbit fault rates calculated using crème tool ( ISS – International space station

South Atlantic Anomaly (SAA) Poles Calculated using CRÈME 96 tool 13 Virtex-5LX110T ISS orbit SEU rates

14 AFT PR SoC Resource Requirements and Analysis SoC operates at 100MHz  71% of total device slices used Normalized PRR resource utilization calculation SymbolDefinition P nru Normalized resource utilization P av Total PRRs available P req Number of PRRs required per PRM P used Number of PRRs used per PRM P ex Number of extra PRRs used P free Number of free PRRs P usable Number of usable free PRRs where,,, and Finally,

15 AFT PR SoC Resource Utilization 100% PRR utilization 50% PRR utilization Average 21% increase in PRR resource utilization over 24-hour period

16 Conclusions and Future Work Conclusions  We designed and implemented an adaptive fault tolerant partially reconfigurable system-on-chip (AFT PR SoC) leveraging VAPRES The Virtual Architecture for Partially Reconfigurable Embedded Systems  A novel MicroBlaze-based software controller (AFT controller) adapts the AFT PR SoC’s fault tolerance to changing space radiation levels Achieves higher resource utilization in comparison to a traditional triple modular redundancy (TMR)-based fault tolerant (FT) PR SoC Our results indicate the AFT PR SoC can achieve an average of 22% higher resource utilization in the International Space Station (ISS) orbit compared to a traditional FT SoC  The AFT PR SoC is an ideal platform for space SoCs System designers can implement a wide variety of applications using the AFT PR SoC’s PRRs Future Work  Integrating an operating system in our space SoC to allow parallel software processes to control voting and reliability mode switching  Upgrading the AFT PR SoC’s MicroBlaze processor with a LEON3FT fault tolerant processor to provide additional system reliability  Using fault injection techniques to test our space SoCs robustnes

QUESTIONS? This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC We also gratefully acknowledge tools provided by Xilinx.