Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.

Slides:



Advertisements
Similar presentations
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Advertisements

Lecture 12 Layer 2 – Data Link Layer Protocols
Fault-Tolerant Systems Design Part 1.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Data and Computer Communications
4. Information Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
NETWORKING CONCEPTS. ERROR DETECTION Error occures when a bit is altered between transmission& reception ie. Binary 1 is transmitted but received is binary.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 12 Introduction to Computer Networks.
1 Chapter Fault Tolerant Design of Digital Systems.
DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
Error detection and correction
Fehlererkennung in SW David Rigler. Overview Types of errors detection Fault/Error classification Description of certain SW error detection techniques.
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Data Transmission Most digital messages are longer than just a few bits. It is neither practical nor economical to transfer all bits of a long message.
Adapted from Tanenbaum's Slides for Computer Networking, 4e The Data Link Layer Chapter 3.
Error Detection and Correction Rizwan Rehman Centre for Computer Studies Dibrugarh University.
Chapter 10 Error Detection and Correction
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 PART III: DATA LINK LAYER ERROR DETECTION AND CORRECTION 7.1 Chapter 10.
Data link layer: services
Finite State Machines. Binary encoded state machines –The number of flip-flops is the smallest number m such that 2 m  n, where n is the number of states.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.
It is physically impossible for any data recording or transmission medium to be 100% perfect 100% of the time over its entire expected useful life. As.
Data Link Layer - 1 Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
CS3502: Data and Computer Networks DATA LINK LAYER - 1.
Error Coding Transmission process may introduce errors into a message.  Single bit errors versus burst errors Detection:  Requires a convention that.
Part 2: Packet Transmission Packets, frames Local area networks (LANs) Wide area networks (LANs) Hardware addresses Bridges and switches Routing and protocols.
جلسه هشتم شبکه های کامپیوتری به نــــــــــــام خدا.
COEN 180 Erasure Correcting, Error Detecting, and Error Correcting Codes.
Fault-Tolerant Systems Design Part 1.
Data and Computer Communications Chapter 6 – Digital Data Communications Techniques.
Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.
ECE453 – Introduction to Computer Networks Lecture 4 – Data Link Layer (I)
10/27/ Data Link Layer - Lin 1 CPET/ECET Data Link Layer Data Communications and Networking Fall 2004 Professor Paul I-Hai Lin Electrical.
CprE 458/558: Real-Time Systems
Fault-Tolerant Systems Design Part 1.
COMPUTER NETWORKS Ms. Mrinmoyee Mukherjee Assistant Professor St. Francis Institute of Technology, Mount Poinsur, S.V.P Road, Borivli (west), Mumbai
1 © Unitec New Zealand CRC calculation and Hammings code.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 Block Coding Messages are made up of k bits. Transmitted packets have n bits, n > k: k-data bits and r-redundant bits. n = k + r.
Error Detection. Data can be corrupted during transmission. Some applications require that errors be detected and corrected. An error-detecting code can.
Error Detection and Correction – Hamming Code
Error Detection and Correction
Winter 2007CS244a Handout 141 CS244a: An Introduction to Computer Networks Handout 14: Error Detection and Correction Nick McKeown Professor of Electrical.
A4 1 Barto "Sequential Circuit Design for Space-borne and Critical Electronics" Dr. Rod L. Barto Spacecraft Digital Electronics Richard B. Katz NASA Goddard.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Network Layer4-1 Chapter 5: The Data Link Layer Our goals: r understand principles behind data link layer services: m error detection, correction m sharing.
Part III: Data Link Layer Error Detection and Correction
Chi-Cheng Lin, Winona State University CS412 Introduction to Computer Networking & Telecommunication Error Correction/Detection.
ERROR DETECTION AND CORRECTION Chapter 8 Data Communications & Networking ERROR DETECTION AND CORRECTION Chapter 8 First Semester 2007/2008.
Channel Coding and Error Control 1. Outline Introduction Linear Block Codes Cyclic Codes Cyclic Redundancy Check (CRC) Convolutional Codes Turbo Codes.
Simple Parity Check The simplest form of error detection is the parity check used with ASCII codes, originally on asynchronous modem links Each 7 bit ASCII.
2.8 Error Detection and Correction
ECE 753: FAULT-TOLERANT COMPUTING
Subject Name: COMPUTER NETWORKS-1
Communication Networks: Technology & Protocols
Advanced Computer Networks
Error Detection and Correction
Part III Datalink Layer 10.
Coding Theory Dan Siewiorek June 2012.
Packetizing Error Detection
Packetizing Error Detection
Packetizing Error Detection
Coding and Error Control
Error Detection and Correction
Fault Tolerant Systems in a Space Environment
Error Detection and Correction
2.8 Error Detection and Correction
Presentation transcript:

Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

Error Detection in Hardware2 Error detection  How to detect errors with hardware methods during system operation  Conditions  Coverage (probability that error is detected)  Latency (time between start of error and detection)  Performance Slide from VO „Echtzeitsysteme“, H. Kopetz

Error Detection in Hardware3 Hardware-based error detection  Hardware redundancy  Passive (TMR, majority voting)  Active (duplication and comparison, standby)  Hybrid  Information redundancy  Parity  Checksums  Arithmetic Codes  Time redundancy  Watchdog timers  Checking  Capability Checking  Consistency Checking  Control-Flow Checking

Error Detection in Hardware4 Information redundancy (1)  Detection / Correction  Hamming distance  X = (1001), Y = (0111)  d(X,Y) = 3  SEC – DED

Error Detection in Hardware5 Information redundancy (2)  Parity  One extra bit (even / odd)  Decoding circuit (set of XOR gates)  Routine checking in busses, memory and registers  Detecting single bit errors (no stuck-at faults)

Error Detection in Hardware6 Information redundancy (3)  Overlapping parity  m of n codes  Duplication codes  Cycle redundancy checks  Sender and receiver agree upon generator polynom G(x)  Append checksum (k bit) at end of data frame (n-k bit)  Checksum / G(x) = 0  correct  Simple implementation (linear feedback shift register and XOR gates)  Detect single-bit errors, multiple adjacent bit errors affecting fewer than n-k bits, and burst transient errors  High successful in serial transmission (communication channels: Ethernet, Token Ring)

Error Detection in Hardware7 Information redundancy (4)  Checksums

Error Detection in Hardware8 Information redundancy (5)  Arithmetic Codes  Detect errors in arithmetic units (parity would not be preserved)  Separate or nonseparate  Examples  AN codes  Residue codes

Error Detection in Hardware9 Time redundancy (1)  Repetition of computations two or more times and then comparing (detection or correction by majority)  Error detected  maybe retry  Good for detecting transient faults  Not protecting against errors resulting from permanent faults  No extra hardware needed but longer processing time  Non-time-critical applications  Alternate Logic also detects permanent faults (self-checking circuits f(x) = f ‘(x’))

Error Detection in Hardware10 Time redundancy (2)  Handle permanent faults per encoding the second computation (must not alter calculation) e.g. k-shift  Error in k-1 consecutive bit of arithmetic or logical operation detected  Additional hardware (two shifters, storage register, comparator)

Error Detection in Hardware11 Watchdog timers  Implemented in hardware (external timer) or software (process)  If timer expires  system reset or recover  Detect only very specific type = control-flow error  If error occurs but timer reset  no detection  Difficult to determine runtime  High detection latency

Error Detection in Hardware12 Capability & Consistency Checking  Capability checking limits access to objects (e.g. memory segments) to authorized users (processes)  Implemented in hardware (error traps) or software (firewall)  e.g. checking of address validity by MMU  Consistency checking determines if states or results are reasonable  e.g. range checking, address checking, opcode checking

Error Detection in Hardware13 Control-Flow Checking (1)  Hardware scheme  Divide application program into blocks  Each block has a single entry and exit point  Reference signature represents an encoding of the correct execution  Watchdog processor validates the application program by comparing the runtime with the signature  70% of transient faults lead to control flow errors  Limitations  Only suitable for processors running single programs (multiple processes or threads)  Reduced coverage if transmission errors on the bus to the watchdog processor occurs

Error Detection in Hardware14 Control-Flow Checking (2)  Signatured Instruction Stream (SIS)  Hardware: Watchdog processor with cyclic code signature generator  Software: Modified assembler and loader  Control Flow Checking using Shadow Processing

Error Detection in Hardware15 Summary  Hardware low error latency  Hardware is more expensive  e.g. Massively parallel multiprocessors  Combining error detection mechanism

Error Detection in Hardware16 References  Ravishankar K. Iyer, Zbigniew Kalbarczyk - Hardware and Software Error Detection - Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign  Real-Time Systems, Design Principles for Distributed Embedded Applications Kopetz, Hermann, 1997, 356 p., Hardcover, ISBN:  Alireza Vahdatpour, Mahdi Fazeli, Seyed Ghassem Miremadi - Transient Error Detection in Embedded Sysetms Using Reconfigurable Components - IES, October 2006  M. Dal Chin, W. Hohl, E. Michel, A. Pataricza - Error Detection Mechansims for Massively Parallel Multiprocessors - IEEE Proceedings, 1993  Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants   A. Steiniger, C. Scherrer - Identifying Efficient Combinations of Error Detection Mechanisms Based on Results of Fault Injection Experiments - IEEE Transactions on computers, Vol. 51, No. 2, February 2002