Fault Tolerant Systems in a Space Environment

Slides:



Advertisements
Similar presentations
Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.
Advertisements

Fault-Tolerant Systems Design Part 1.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Avishai Wool lecture Introduction to Systems Programming Lecture 8 Input-Output.
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Basic Input/Output Operations
Multiscalar processors
Fehlererkennung in SW David Rigler. Overview Types of errors detection Fault/Error classification Description of certain SW error detection techniques.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Computer Organization and Assembly language
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
(More) Interfacing concepts. Introduction Overview of I/O operations Programmed I/O – Standard I/O – Memory Mapped I/O Device synchronization Readings:
ECE 753: FAULT-TOLERANT COMPUTING Kewal K.Saluja Department of Electrical and Computer Engineering Low Level Fault-Tolerance: Watchdog and Re-execution.
SOCSAMS e-learning Dept. of Computer Applications, MES College Marampally VIRTUALMEMORY.
TOPIC : Introduction to Compression Techniques UNIT 5 : BIST and BIST Architectures Module 5.4 Compression Techniques.
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
Modes of transfer in computer
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
Soft-Error Detection through Software Fault-Tolerance Techniques
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
Multiscalar Processors
Operating Systems (CS 340 D)
Direct Memory address and 8237 dma controller LECTURE 6
Computer Architecture
Page Table Implementation
nZDC: A compiler technique for near-Zero silent Data Corruption
The University of Adelaide, School of Computer Science
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Parallel Shared Memory
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 3 Top Level View of Computer Function and Interconnection
Buses.
CS703 - Advanced Operating Systems
Microarchitectural for monitoring application specific instructions
MAPLD 2005 BOF-L Mitigation Methods for
CSCI1600: Embedded and Real Time Software
TigerSHARC processor General Overview.
ECEG-3202 Computer Architecture and Organization
Module IV Memory Organization.
ECEG-3202 Computer Architecture and Organization
Ghifar Parahyangan Catholic University August 22, 2011
ECEG-3202 Computer Architecture and Organization
BIC 10503: COMPUTER ARCHITECTURE
Miss Rate versus Block Size
Sampoorani, Sivakumar and Joshua
Mastering Memory Modes
CSE451 Virtual Memory Paging Autumn 2002
INTRODUCTION TO COMPUTERS
Computer System Overview
COMP3221: Microprocessors and Embedded Systems
COMPUTER ORGANIZATION AND ARCHITECTURE
Introduction to Computer Systems Engineering
CSCI1600: Embedded and Real Time Software
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Dr. Clincy Professor of CS
Presentation transcript:

Fault Tolerant Systems in a Space Environment Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Overview Introduction Error Detection Technique. *Watchdog Processor *Control Flow Error Detection. *Types of Signatures. Fault Injection. Conclusion. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Introduction Experimented by CRC on Advanced Research and Global Observations Satellite.(ARGOS) The approach mainly focuses on Space missions involving equipment that combines the two basic approaches of Fault Avoidance and Fault Tolerance Mainly uses Software Techniques for detecting errors. Archana EE585: Fault Tolerance Computing

Error Detection Techniques Watch dog Processor It is a small processor that sits on buses , passively observes the bus transactions generated by main processor and detects errors by monitoring. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Watchdog Processor Archana EE585: Fault Tolerance Computing

Control Flow Error Detection Main goal is to check the correct sequencing of the instructions. Done by Signature Analysis. It is a method in which signature is associated with a block of instructions and saved at compile time. During runtime, generated signature is compared with saved ones and errors are detected. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Types of Signatures 1. Path Signature Analysis: * Signatures are computed for sequence of nodes, i.e., paths rather than single node. * Two bits are used to differentiate signatures * A special tag signals the time to compare the computed signature with embedded one. 2. Signature Instruction Streams (SIS) Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Contd…. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Contd… Paths are grouped into sets and each set has a signature, called justifying signature. Control flow diagram of three basic blocks Archana EE585: Fault Tolerance Computing

2.Signature Instruction Streams (SIS) Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Contd… To reduce number of signatures embedded in the code, Branch Address hashing is used. Archana EE585: Fault Tolerance Computing

Branch Address Hashing Archana EE585: Fault Tolerance Computing

Stutter Step Mode (SSM) Each group of instructions is executed twice or more and the results are compared. It detects errors missed by other techniques. Disadvantages: * Performance level is lowered. * Memory overhead. Archana EE585: Fault Tolerance Computing

Application of SSM to one instruction Overhead is 300% Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Contd… Reduced overhead by extending duplication to a basic block. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Error Masking in SSM Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Contd… Assume, values of registers B= 10 C= 7 => A= 17 D= 3 (We know the result of dividing any number between 19 and 15 by 5 is 3.) Say if A= 18 (instead of 17), the error is not detected. Therefore, we need to be careful in selecting the error detection technique. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Fault Injection One way to validate Fault tolerance mechanisms Advantages: 1. Flexibility 2. Controllability 3. Predictability Disadvantages: 1. Its questionable whether the injected faults are good representation of faults in real environment. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Contd… In ARGOS, system is tested in Space environment created. Different approaches to fault injection in electronic systems: 1. Disturb the signals on the pins of the pins. 2. Radiation. 3. Power Supply Disturbance. 4. Logic simulation. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Conclusion Determined the tradeoffs between fault tolerance and fault avoidance techniques and finally come up with an efficient blend of technique suitable. Hardware and Software fault tolerance techniques are studied. Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing References Fault Tolerant Systems in a Space Environment. - Philip P.Shirvani and Edward J. McCluskey. (Stanford University) http://www-crc.stanford.edu/crc_papers/CRC-TR-98-2.pdf Archana EE585: Fault Tolerance Computing

EE585: Fault Tolerance Computing Queries? Archana EE585: Fault Tolerance Computing