Presentation is loading. Please wait.

Presentation is loading. Please wait.

FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.

Similar presentations


Presentation on theme: "FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM."— Presentation transcript:

1 FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM LEVEL) Wintersemester 99/00 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/~rok/ftc

2 FTC (DS) - V - TT - 1 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM LEVEL) OBJECTIVE: –TO INTRODUCE MAIN FAULT RECOVERY AND FAULT TOLERANCE TECHNIQUES FOR COMPUTER SYSTEMS CONTENTS: –DYNAMIC TECHNIQUES –STATIC TECHNIQUES –HYBRID TECHNIQUES

3 FTC (DS) - V - TT - 2 FAULT RECOVERY TECHNIQUES FAULT RECOVERY IS INITIATED BY SUCCESSFUL FAULT DETECTION AND/OR FAULT LOCATION HARDWARE RECOVERY TECHNIQUES INCLUDE REPLACEMENT/REPAIR RECONFIGURATION OR FAULT MASKING SOFTWARE RECOVERY TECHNIQUES INCLUDE EXCEPTION HANDLING RECOVERY BLOCKS MASKING (N-VERSION PROGRAMMING) ROLL-BACKWARD ROLL-FORWARD

4 FTC (DS) - V - TT - 3 SYSTEM REPLICATION METHODS DYNAMIC –DUPLEX –BACK-UP SPARING –DUPLEX AND SPARE –PAIR AND SPARE –SOFTWARE-IMPLEMENTED FAULT TOLERANCE (SIFT) STATIC –TRIPLE MODULAR REDUNDANCY (TMR) –N MODULAR REDUNDANCY (NMR) –(4-2) CONCEPT –SPECIAL LOGIC –TMR WITH DUPLEX MODULES HYBRID –HYBRID REDUNDANCY (NMR WITH SPARES) –TMR WITH TWO SPARES (SPACE SHUTTLE) –SELF-PURGING REDUNDANCY –SIFT-OUT MODULAR REDUNDANCY

5 FTC (DS) - V - TT - 4 DUPLEX SYSTEMS (1) OUTPUT SWITCH OUTPUT 1 OUTPUT 2 Test and Reconfigure P1 P2 PRIMARY UNIT SECONDARY UNIT COMPARATOR INPUT [from Siewiorek and Swarz]

6 FTC (DS) - V - TT - 5 DUPLEX SYSTEMS (2) If a mismatch occurs the following methods can be used to identify a faulty system: –Self-diagnostic program –Self-checking logic (capabilities) –Watchdog timer method (periodically reset timer of another processor) –Outside arbiter (may check signatures or run tests)

7 FTC (DS) - V - TT - 6 DUPLEX SYSTEMS (3) SYNCHRONIZATION METHODS At the end of each clock period (cycle or microcycle) (e.g., ESS systems, UDET) Update and match unit (UPM) compares every bus cycle (e.g., AXE telephone switching system) At the end of program execution - program or subroutine level comparison (e.g. COMTRAC railway control system) RELIABILITY OF DUPLEX SYSTEMS C - coverage factor (represents the combined probability of successful fault detection and reconfiguration) R k - reliability of the control, switching and matching circuitry

8 FTC (DS) - V - TT - 7 DUPLEX SYSTEMS (4) Back-up Sparing MODULES 1 2 n SWITCH OUTPUT INPUT HOT, WARM AND COLD SPARES

9 FTC (DS) - V - TT - 8 DUPLEX AND SPARE MODULES 1 2 3 OUTPUT INPUT COMPARATOR SWITCH

10 FTC (DS) - V - TT - 9 PAIR AND SPARE MODULES 1 2 3 OUTPUT INPUT COMPARATOR 4 SWITCH/COMPARATOR

11 FTC (DS) - V - TT - 10 TRIPLE MODULAR REDUNDANCY (TMR) (1) A method that incorporates static redundancy into system design The voter produces correct output if there are no failures in the voter and if there are no failures in two of the three modules Input Voter output Module A B C Voter Triple Modular Redundancy (TMR) configuration.

12 FTC (DS) - V - TT - 11 TRIPLE MODULAR REDUNDANCY (TMR) (2) Reliability –R TMR = R V (reliability of 2 out of 3 modules) –R V - Reliability of the voter –R m - Reliability of each module When does a TMR system have a higher reliability than the original single module? –Must have R TMR > R m

13 FTC (DS) - V - TT - 12 TRIPLE MODULAR REDUNDANCY (TMR) (3) Assuming a perfect voter (R V = 1) TMR is more reliable only if R m > 0.5 Also the voter must be very reliable. Must have R V > 0.9 for R TMR > R m This technique can be generalized to any odd number of modules N R sys 1.75.5.25 0 0.5 1.0 R m Single Module TMR

14 FTC (DS) - V - TT - 13 TMR WITH DUPLEX MODULES (USED IN JAPANESE TRAIN SHINKANSEN) MODULES 1 2 3 OUTPUT INPUT COMPARATORS / SWITCHERS 4 5 6 1 2 3 VOTER

15 FTC (DS) - V - TT - 14 HYBRID REDUNDANT SYSTEM (1) One of the drawbacks of N-modular redundancy with voting (NMR) is that fault masking ability deteriorates as more copies fail. Hybrid redundancy combines NMR with backup sparing. M 1 M 2 M 3 M N+S functional units Switch Select N out of (N + S) N + S Voter Voted output Voter-Switch-Detector (VSD) Control lines N 1 Disagree- ment detector (Siewiorek & Swarz) Basic organization of a hybrid-redundant system

16 FTC (DS) - V - TT - 15 HYBRID REDUNDANT SYSTEM (2) Assuming the same reliability of modules on-line and on standby, the system reliability is: P =  N/2  + S = The maximum number of modules that can fail without crashing the system

17 FTC (DS) - V - TT - 16 Plots of hybrid TMR system reliability (Rs) & individual module reliability (Rm)S Plots of hybrid TMR system reliability (R s ) vs. individual module reliability (R m ) S is the number of spares. (Siewiorek and Swarz) b. System with standby failure rate 10% of on-line failure rate a. System with standby failure rate equal to on-line failure rate 0. 8 0 1. 0 0 0. 8 0 1. 0 0 Simplex S = 0 (TMR) RMRM S = 6 4 2 1 RSRS RMRM RSRS S = 6 Simplex S = 0 (TMR) 4 2 1

18 FTC (DS) - V - TT - 17 SELF-PURGING REDUNDANCY System using self-purging redundancy (Siewiorek and Swarz) Potentially more reliable than hybrid Threshold gates are analog circuit elements

19 FTC (DS) - V - TT - 18 SIFT-OUT MODULAR REDUNDANCY (N-2) - fault-tolerant Basic configuration for sift-out redundancy BASIC CONCEPT: –COMPARE EACH PAIR AND ELIMINATE FAULTY UNITS ­ M 1 M 2 M N D 1 D 1 D N Clock Collector DetectorComparator Output Nredundant modules, operating synchronously E 12 E 13 F 1 F 2 F N E (N-1)N Nlines, line, signals the failure of module. F i N C 2 lines, each for signaling the disagreement of a pair of modules

20 FTC (DS) - V - TT - 19 TMR WITH TWO SPARES (USED IN SPACE SHUTTLE) MODULES 1 2 3 OUTPUT INPUT 4 5 VOTER / SWITCH PRIMARY MODULES 1, 2 and 3 “ WARM “ SPARE 4 “ COLD “ SPARE 5


Download ppt "FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM."

Similar presentations


Ads by Google