Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Quality Assurance

Similar presentations


Presentation on theme: "Software Quality Assurance"— Presentation transcript:

1 Software Quality Assurance
Fault Tolerant Software Rajiv Krishna Nekkanti Dept. of Computer Science Auburn University Lei Chen Dept. of Computer Science Auburn University

2 Software Quality Assurance
Basic Definitions Error – mistake Fault – result of an error Failure – occurs when a fault executes Dealing with faults : Fault Avoidance Fault Removal Bug Free? Fault Tolerance  our focus

3 Fault Tolerant Software
Software fault tolerance is not a license to ship the system with bugs. The real objective is to improve system performance and availability in cases when the system encounters a software or hardware fault.

4 Why SFT? In January 1990, AT&T system suffered a nine-hour United-States blockade when one switch experienced abnormal behavior and attempted recovery. Because of a flaw in recovery-recognition software and a network design that permitted propagation of the effects, the problem spread to all switches. During the Persian Gulf War, clock drift in the Patriot system caused it to miss a scud missile that hit an American barracks in Dhahran. The missile hit killed 29 people and injured 97 others. The clock drift was reportedly caused by the software’s use of two different and unequal representations (24-bit and 48 bit) of the value 0.1.

5 SFT - Key Concepts Error Recovery Redundancy

6 Error Recovery Error Detection Error Diagnosis
Error Containment/isolation Error Recovery

7 Redundancy Provides additional capabilities and resources needed to detect and tolerate faults Several forms Hardware Software Information/Data Time (Temporal Redundancy)

8 Redundancy Includes additional programs, modules, functions or objects
To tolerate faults arising from specification and design errors or implementation (coding) mistakes Cannot be detected by simple replication of identical software units Have to introduce diversity into software replicas – DIVERSITY

9 Diversity Design Diversity Data Diversity Temporal Diversity

10 Design Diversity production of two or more systems (e.g., software modules) aimed at delivering the same service through independent designs and realizations. The systems, produced through the design diversity approach from a common service specification, are called variants. Incorporating two or more variants of a system, tolerance to design faults necessitates an adjudicator, which is based on some previously defined decision strategy and is aimed at providing (what was assumed to be) an error-free result from the outcomes of variant execution.

11 Design Diversity (cont.)
Techniques Recovery Blocks N-Version Programming Distributed Recovery Blocks N-Self Checking Programming Consensus Recovery Block

12 Recovery Blocks Consists of an executive, an acceptance test (AT), and alternate try blocks (variants). Will first attempt to ensure the AT by using the primary alternate (or try block). If the primary algorithm’s result does not pass the AT, then n-1 alternates will be attempted until an alternate’s results pass the AT. If no alternates are successful, an error occurs.

13 Recovery Blocks (cont.)
Example: Sorting of Numbers

14 N-Version Programming
Consists of an executive, n variants (versions), and a Decision Mechanism (DM). Uses at least two independently designed, functionally equivalent versions (variants) of a program developed from the same specification. run Version 1, Version 2, …, Version n If (Decision Mechanism (Result 1, Result 2, …, Result n)) return Result else failure exception “Static Technique” because various programs will perform the task, regardless of which result (s) was determined acceptable by the DM

15 N-Version Programming (cont.)
Example: Sorting of Numbers

16 Diversity Design Diversity Data Diversity Temporal Diversity

17 Data Diversity Use the principle of redundancy (not simple like-copies) and diversifying input data to detect and tolerate software faults. Done using DATA RE-EXPRESSION. Data Re-expression Algorithm (DRA) produces different representations of a module’s input data.

18 Data Diverse SFT Techniques
Retry Blocks N-Copy Programming

19 Retry Blocks (RtB) The RtB technique uses acceptance tests(AT) to accomplish fault tolerance. AT : acceptance tests DRA : data re-expression algorithm

20 Retry Blocks Structure and Operation
Similar to Recovery Blocks Difference from Recovery Block: the DRA

21 Retry Blocks Example:

22 N-Copy Programming (NCP)
NCP is the data diverse complement of N-version programming (NVP). It uses a decision mechanism (DM) to accomplish fault tolerance. DM : decision mechanism DRA : data re-expression algorithm

23 N-Copy Programming N copies of a program execute in parallel
Each on a different set of re-expressed data

24 N-Copy Programming Example:

25 Diversity Design Diversity Data Diversity Temporal Diversity

26 Temporal Diversity Temporal diversity involves the performance or occurrence of an event at different times. It can be implemented by beginning software execution at different times or using inputs that are produced or read at different times. It can be an effective means of overcoming transient faults because the temporary conditions that cause problems in one execution may be absent when the software is re-executed.

27 Adjudicators Adjudicators decide which result to choose as an output. The output should not only be the correct one but also the best one. So more than one type of adjudicator can be used . Voters ATs : acceptance tests

28 Adjudicating

29 Voters

30 Exact majority voter

31 Acceptance Tests

32 Summary These days more and more people depend daily on services provided by computer control systems. These computers control ordinary systems such as automobiles, elevators, aircraft, banking systems, power plants, and so on. Should these computers fail, the consequences could be disastrous, such as severe economic losses or even the loss of human lives. Since design faults cannot be totally eradicated from such control systems, they will have to be tolerated during operation without the loss of service.

33 References Software Fault Tolerance , Michael R. Lyu ,1995, John Wiley & Sons, ISBN Software Fault Tolerance-Techniques and Implementation, Laura L. Pullum, 2001, Artech House, ISBN Fault Tolerant Software Systems: Techniques and Applications, Edited by Hoang Pham, 1992, IEEE Computer Society Press Software Fault-Tolerance Techniques from a Real-Time Systems Point of View, Technical Report No , Martin Hiller Department of Computer Engineering Chalmers University of Technology SE Göteborg Sweden November 1998 Software Fault Tolerance Definitions-Software Fault Tolerance,

34 Questions & Comments?


Download ppt "Software Quality Assurance"

Similar presentations


Ads by Google