Presentation is loading. Please wait.

Presentation is loading. Please wait.

SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.

Similar presentations


Presentation on theme: "SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical."— Presentation transcript:

1 SENG521 (Fall 2002)far@enel.ucalgary.ca1 SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical & Computer Engineering, University of Calgary B.H. Far ( far@enel.ucalgary.ca ) http://www.enel.ucalgary.ca/~far/Lectures/SENG521/04b/

2 SENG521 (Fall 2002)far@enel.ucalgary.ca2 Fault Tolerance (Review) A fault-tolerant computing system must be capable of providing specified services in the presence of a bounded number of failures. These failures could occur because of faults present in either the components of the system or in the system’s design. Most of the software faults are due to deficiencies of design and almost all of the hardware fault tolerance techniques cannot be applied in software.

3 SENG521 (Fall 2002)far@enel.ucalgary.ca3 Acceptance Testing A program-specific error detection mechanism to check on the results of program execution. Usually evaluates to either “true” or “false”. ensure by P 0 else-by P 1 else fail Examples: Checksums for program parts Internal check points: ABS[(SQRT(x)*SQRT(x)) – x] < E

4 SENG521 (Fall 2002)far@enel.ucalgary.ca4 External Consistency A kind of external error detection mechanism to judge correctness of execution of a program. Examples: Exception signal when dividing by zero Integer overflow signal Interrupt signal for program loop Float point numerical failure check

5 SENG521 (Fall 2002)far@enel.ucalgary.ca5 Example The correct answer is 8779. But ordinary implementation of this will return zero due to rounding and large differences in the order of magnitude of the summands.

6 SENG521 (Fall 2002)far@enel.ucalgary.ca6 Redundancy Dual software technique: Implementing two (or more) distinct versions of the same software and executing them for the same set of inputs. Any discrepancy in the outputs of the two versions may trigger an alarm. Redundancy techniques’ efficiency depends on coincident, correlated and dependent faults.

7 SENG521 (Fall 2002)far@enel.ucalgary.ca7 Coincident Faults Coincident Faults: Coincident Faults: when two or more functionally equivalent software components fail on the same input. When two or more software versions give the same incorrect response, an identical-and- wrong (IAW) answer is obtained.

8 SENG521 (Fall 2002)far@enel.ucalgary.ca8 Correlated & Dependent Faults Correlated Faults: Correlated Faults: Two faults are correlated when the measured probability of the coincidence failures is significantly higher than what would be expected from coincidence. If There will be no failure independence.

9 SENG521 (Fall 2002)far@enel.ucalgary.ca9 Possible Failure Scenario What if the software components produce doublet or triplet IAW responses? P1 P2 P3 Doublet & triplet IAW faults Adjudication Algorithm

10 SENG521 (Fall 2002)far@enel.ucalgary.ca10 Adjudication by Voting A “voter” compares results from two or more functionally equivalent software components and decides which of the answers provided by those components is correct. Various versions of voting algorithm: Majority voting 2-of-N voting Consensus voting

11 SENG521 (Fall 2002)far@enel.ucalgary.ca11 Techniques Recovery blocks N-version programming Consensus recovery block Acceptance voting N self-checking programming

12 SENG521 (Fall 2002)far@enel.ucalgary.ca12 Recovery Blocks (RB) Using multiple versions of software module and acceptance test. The output of the 1 st module is tested for acceptability and if fails, the 2 nd module is executed after backward state recovery.

13 SENG521 (Fall 2002)far@enel.ucalgary.ca13 N-Version Programming Parallel execution of N independently developed functionally equivalent modules. Adjudication is via voting. The voter accepts all N outputs and selects the correct one among them. Advantage of NVP: no service interrupt.

14 SENG521 (Fall 2002)far@enel.ucalgary.ca14 Consensus Recovery Block Composed of NVP and RB. IF NVP fails, the system reverts to RB using the same blocks. NVP RB input failure System failure Correct output Correct output success

15 SENG521 (Fall 2002)far@enel.ucalgary.ca15 Acceptance Voting Like NVP all versions are executed in parallel. The output of ach module goes to an acceptance test. If acceptance test is successful, the output goes to a voter.

16 SENG521 (Fall 2002)far@enel.ucalgary.ca16 N Self-Check Programming In N Self-Check Programming (NSCP), N modules are executed in pairs. The pairs’ outputs can be compared or accessed for correctness.


Download ppt "SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical."

Similar presentations


Ads by Google