DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)

Slides:



Advertisements
Similar presentations
What is RAID Redundant Array of Independent Disks.
Advertisements

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Cyclic Code.
Computer Interfacing and Protocols
Lecture 12 Layer 2 – Data Link Layer Protocols
Fault-Tolerant Systems Design Part 1.
Chapter 19: Network Management Business Data Communications, 5e.
Ch 2.7 Error Detection & Correction CS-147 Tu Hoang.
Introduction to Information Technologies
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
CSE 461: Error Detection and Correction. Next Topic  Error detection and correction  Focus: How do we detect and correct messages that are garbled during.
Reliability & Channel Coding
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
DS - VI - FTM - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Dependable Systems Vorlesung 6 FAULT-TOLERANT AND FAULT-SECURE MEMORIES Wintersemester.
DS - VI - FTM - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
DS - IV - TT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 4 Topological Testing Wintersemester 2000/2001 Leitung:
8. Fault Tolerance in Software
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Unit 1 Protocols Learning Objectives: Understand the need to detect and correct errors in data transmission.
Design of SCS Architecture, Control and Fault Handling.
Shashank Srivastava Motilal Nehru National Institute Of Technology, Allahabad Error Detection and Correction : Data Link Layer.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
COM342 Networks and Data Communications
1 Data Link Layer Lecture 20 Imran Ahmed University of Management & Technology.
CIT 307 Online Data Communications Error Detection Module 11 Kevin Siminski, Instructor.
Part 2: Packet Transmission Packets, frames Local area networks (LANs) Wide area networks (LANs) Hardware addresses Bridges and switches Routing and protocols.
1 Part III Packet Transmission Chapter 7 Packets, Frames, and Error Detection.
Data Link Layer: Error Detection and Correction
Fault-Tolerant Systems Design Part 1.
1 Packets, Frames, and Error Detection. 2 The Problem Cannot afford individual network connection per pair of computers Reasons –Installing wires consumes.
Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
COSC 3213: Computer Networks I Instructor: Dr. Amir Asif Department of Computer Science York University Section M Topics: 1. Error Detection Techniques:
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
CprE 458/558: Real-Time Systems
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Fault-Tolerant Systems Design Part 1.
COMPUTER NETWORKS Ms. Mrinmoyee Mukherjee Assistant Professor St. Francis Institute of Technology, Mount Poinsur, S.V.P Road, Borivli (west), Mumbai
PLC ARCHITECTURE - CPU by Dr. Amin Danial Asham.
1 Chapter 7 Switching, Packets, Frames, Parity, Checksums, and CRCs.
SEPT, 2005CSI Part I.2 Packets, Frames, Parity, Checksums, and CRCs Dr. R.L. Probert, SITE, University of Ottawa.
DS - IX - NFT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 9 NETWORK FAULT TOLERANCE Wintersemester 99/00 Leitung:
Part III: Data Link Layer Error Detection and Correction
Data Link Layer 1. 2 Single-bit error 3 Multiple-bit error 4.
Structuring Redundancy for Fault Tolerance Chapter 2 Designed by: Hadi Salimi Instructor: Dr. Mohsen Sharifi.
Chi-Cheng Lin, Winona State University CS412 Introduction to Computer Networking & Telecommunication Error Correction/Detection.
ERROR DETECTION AND CORRECTION Chapter 8 Data Communications & Networking ERROR DETECTION AND CORRECTION Chapter 8 First Semester 2007/2008.
TITLE : types of BIST MODULE 5.1 BIST basics
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Week#3 Software Quality Engineering.
CS4470 Computer Networking Protocols
Simple Parity Check The simplest form of error detection is the parity check used with ASCII codes, originally on asynchronous modem links Each 7 bit ASCII.
Communication Networks: Technology & Protocols
Dr. Clincy Professor of CS
Advanced Computer Networks
CIS 321 Data Communications & Networking
Coding Theory Dan Siewiorek June 2012.
Computer Networks Bhushan Trivedi, Director, MCA Programme, at the GLS Institute of Computer Technology, Ahmadabad.
Information Redundancy Fault Tolerant Computing
Packets, Frames, Parity, Checksums, and CRCs
COMPUTER NETWORKS CS610 Lecture-5 Hammad Khalid Khan.
Packets, Frames, Parity, Checksums, and CRCs
Switching, Packets, Frames, Parity, Checksums, and CRCs
Error Detection Learning Objectives:
Presentation transcript:

DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business) Vorlesung 5 FAULT DIAGNOSIS TECHNIQUES Wintersemester 2000/2001 Leitung: Prof. Dr. Miroslaw Malek

DS -V - FDT - 2 FAULT DIAGNOSIS TECHNIQUES OBJECTIVES: –TO INTRODUCE MAIN FAULT DETECTION AND FAULT LOCATION TECHNIQUES CONTENTS: –FAULT DETECTION TECHNIQUES FAULT LOCATION (ISOLATION) METHODS

DS -V - FDT - 3 FAULT DIAGNOSIS TECHNIQUES FAULT DETECTION + FAULT LOCATION = FAULT DIAGNOSIS FAULT DETECTION BY –REPLICATION CHECKS –TIMING CHECKS –REVERSAL CHECKS –CODING CHECKS –REASONABLENESS CHECKS –STRUCTURAL CHECKS –DIAGNOSTIC CHECKS –ALGORITHMIC CHECKS

DS -V - FDT - 4 REPLICATION CHECKS POWERFUL, COMPLETE, EXPENSIVE TESTS EXECUTION AGAINST ALTERNATE IMPLEMENTATION EXAMPLES –EXECUTE IDENTICAL COPIES ON SEPARATE HARDWARE ASSUMES DESIGN IS CORRECT AND ONLY COMPONENT FAILURES OCCUR INDEPENDENTLY –EXECUTE SEPARATE AND DIFFERENT VERSIONS WITH DIFFERENT DESIGNS ASSUMES DESIGN MAY BE INCORRECT AND DESIGN FAULTS OCCUR INDEPENDENTLY MAY PROVIDE CHECKING INFORMATION BUT BE UNEXECUTABLE – EXECUTE SAME COPY MULTIPLE TIMES ASSUMES FAULT IS TRANSIENT –REPLICATE ONLY A PORTION OF A SYSTEM ASSUMES REQUESTED RESPONSE IS CORRECT IF CORRECT FIXED RESPONSE IS ALSO GENERATED

DS -V - FDT - 5 EXAMPLE: 3B20 FROM AT&T

DS -V - FDT - 6 TIMING CHECKS A LIMITED REPLICATION CHECK TESTS EXECUTION AGAINST TIMING CONSTRAINTS EXAMPLES –WATCHDOG TIMER PROCESS RESETS TIMER INDICATING SATISFACTORY OPERATION IF TIME EXPIRES, ASSUME FAILED PROCESS –MESSAGE-BROADCASTING PROCESS BROADCASTS MESSAGE TO OTHER PROCESSES, RECIPIENTS CHECK FOR MESSAGE IF MESSAGE NOT RECEIVED, ASSUME FAILED SENDER –MESSAGE-REQUESTING PROCESS SENDS REQUEST TO OTHER PROCESS IF RETURN MESSAGE NOT RECEIVED, ASSUME FAILED RECIPIENT PROCESS

DS -V - FDT - 7 EXAMPLE: TANDEM SYSTEM

DS -V - FDT - 8 REVERSAL CHECKS INPUTS AND OUTPUTS ARE ONE-TO-ONE CALCULATES INPUTS FROM OUTPUTS AND TESTS AGAINST ACTUAL INPUTS EXAMPLES –REREAD DATA AFTER A WRITE –MATHEMATICAL FUNCTIONS ( SQRT(X) ) 2 = X ? A * A -1 = I ?

DS -V - FDT - 9 CODING CHECKS REDUNDANT REPRESENTATIONS OF OBJECTS EXAMPLES –PARITY BIT DETECT ODD NUMBER OF ERRORS –HAMMING CODE CORRECT SINGLE ERRORS –CYCLIC REDUNDANCY CODE DETECT ERRORS IN BLOCKS OF DATA –ARITHMETIC CODE BASED ON REMAINDER THEOREMS FOR RESIDUE ARITHMETIC –CHECKSUM DETECT ERRORS IN BLOCKS OF DATA –BERGER CODE NUMBER OF 1'S OR 0'S

DS -V - FDT - 10 REASONABLENESS CHECKS KNOWING THE SYSTEM INTERNAL DESIGN AND CONSTRUCTION TESTS STATES OF OBJECTS AGAINST INTENDED USE AND PURPOSE EXAMPLES –RANGE CHECKING ANGLE IN DEGREES IS WITHIN [0,360] ? –BOUNDS CHECKING ARRAY INDEX IS WITHIN BOUNDS ? –CONSISTENCY CHECKING ON-GROUND AIRCRAFT HAS UNRETRACTED WHEELS ? – TYPE CHECKING I.NUM IS INTEGER ? MODULO 2 (EVEN_NUMBER) = 0 ? – CAPABILITY CHECKING READ_ACCESS IS YES ? – RELIABILITY CALCULATION IS IT WITHIN [ 0, 1 ] ?

DS -V - FDT - 11 STRUCTURAL CHECKS CONSISTENT STRUCTURE OF DATA EXAMPLES –COUNT OF NUMBER OF ELEMENTS IN STRUCTURE –REDUNDANT POINTERS –STATUS INFORMATION CHECK SYSTEM CONFIGURATION

DS -V - FDT - 12 DIAGNOSTIC CHECKS TEST COMPONENTS USING A SET OF INPUTS FOR WHICH THE OUTPUTS ARE KNOWN PROGRAMS WHICH TEST FOR HARDWARE FAULTS EXAMPLES –MEMORY TESTS WRITE AND READ TEST PATTERNS –ENVIRONMENTAL TESTS RUN AT ABNORMAL VOLTAGES –LOAD TESTS RUN AT SATURATION LEVELS

DS -V - FDT - 13 ALGORITHMIC CHECKS CHECKING INVARIANTS OF AN ALGORITHM –EXAMPLE: SORTING NUMBER OF ENTRIES CHECKSUM INVARIANT CODES –EXAMPLE: MATRIX MULTIPLICATION (Abraham) A x B = C COLUMN ROW ROW/COLUMN CHECKSUM CHECKSUM CHECKSUMS OBTAINED FROM C AND BY A x B  COMPARE x =

DS -V - FDT - 14 FAULT DIAGNOSIS TECHNIQUES IN MULTIPROCESSORS THE FAULT DIAGNOSIS SHOULD LOCATE A FIELD REPLACEABLE UNIT WHICH COULD BE –PROCESSOR(S) BOARD(S) –MEMORY(IES) BOARD(S) –SWITCHING ELEMENT(S) BOARD –INTERFACE BOARD –I/O BOARD(S) –SUPPORT PROCESSOR BOARD –SOFTWARE MODULES PACKAGING DETERMINES THE REQUIRED LEVEL OF FAULT DIAGNOSIS LOCATABILITY PACKAGING, TESTABILITY, DIAGNOSABILITY AND PERFORMANCE INSTRUMENTATION ARE USUALLY AFTERTHOUGHTS IN THE DESIGN PROCESS CONCURRENT ERROR DETECTION IS INDISPENSABLE IN MULTIPROCESSOR ENVIRONMENT DUE TO HIGH SYSTEM COMPLEXITY AND RAPID SYSTEM CONTAMINATION PLACE STRONG EMPHASIS ON APPLICATION (ALGORITHMIC) LEVEL DIAGNOSIS BY EMPLOYING SOPHISTICATED ACCEPTANCE TESTS CONCURRENT DIAGNOSIS SHOULD COVER ALL SYSTEM LEVELS

DS -V - FDT - 15 DIAGNOSIS TECHNIQUES PARALLEL VS. LOCALIZED DIAGNOSIS CENTRALIZED VS. DISTRIBUTED DIAGNOSIS DEPENDING ON THE FUNCTIONALITY STATEMENT AND THE MODEL ASSUMPTIONS, THE PROBLEM MAY VARY FROM RELATIVELY EASY TO EXTREMELY COMPLEX.

DS -V - FDT - 16 CONCURRENT ERROR DETECTION NUMEROUS ERROR DETECTION CODES ARE USED BUT FOR MULTIPROCESSORS THE FOLLOWING SEEM TO BE MOST EFFECTIVE: PROCESSORS –SIGNATURE ANALYSIS (AN EFFECTIVE ARITHMETIC CODE IS YET TO BE FOUND) MEMORIES –ERROR-CORRECTING CODES OR PARITY NETWORKS –PARITY OR BERGER CODE WITH RETRY; CRC METRICS INCLUDE: –FAULT CLASSES AND THEIR COVERAGE –COST –TIME (PERFORMANCE) –RELIABILITY