Software Quality Assurance

Slides:



Advertisements
Similar presentations
Fault-Tolerant Systems Design Part 1.
Advertisements

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Software Reliability Engineering
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Making Services Fault Tolerant
1 Building Reliable Web Services: Methodology, Composition, Modeling and Experiment Pat. P. W. Chan Department of Computer Science and Engineering The.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
8. Fault Tolerance in Software 8.1 Introduction Is it true that a program that has once performed a given task as specified will continue to do so? Yes,
DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
8. Fault Tolerance in Software
Reliability on Web Services Pat Chan 31 Oct 2006.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
ABCSG - Dependable Systems - 01/06/ ABCSG Dependable Systems.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Constructing Reliable Software Components Across the ORB M. Robert Rwebangira Howard University Future Aerospace Science and Technology.
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Design of SCS Architecture, Control and Fault Handling.
ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS, AUHenrik Bærbak Christensen1 Fault Tolerant Architectures Lyu Chapter 14 Sommerville Chapter 20 Part II.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Fault-Tolerant Systems Design Part 1.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
CprE 458/558: Real-Time Systems
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Idaho RISE System Reliability and Designing to Reduce Failure ENGR Sept 2005.
Fault-Tolerant Systems Design Part 1.
SEN 460 Software Quality Assurance. Bahria University Karachi Campus Waseem Akhtar Mufti B.E(UIT), M.S(S.E) AAU Denmark Assistant Professor Department.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
NASA Software Assurance Symposium 2001 Metrics for Fault-Tolerant Real-Time Software Afzel Noore Computer Science and Electrical Engineering West Virginia.
CSE 8377 Software Fault Tolerance. CSE 8377 Motivation Software is becoming central to many life- critical systems Software is created by error-prone.
1 Developing Aerospace Applications with a Reliable Web Services Paradigm Pat. P. W. Chan and Michael R. Lyu Department of Computer Science and Engineering.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Structuring Redundancy for Fault Tolerance Chapter 2 Designed by: Hadi Salimi Instructor: Dr. Mohsen Sharifi.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Week#2 Software Quality Assurance Software Quality Engineering.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ
TECHNICAL SEMINAR On. introduction  Cloud support for real time system is really important because, today we found a lot of real time systems around.
Week#3 Software Quality Engineering.
Fault-Tolerant Computing Systems #3 Fault-Tolerant Software
Some Simple Definitions for Testing
Fault Tolerance In Operating System
Software Quality Engineering
Multi-version approach (with error detection and recovery)
Fault Injection: A Method for Validating Fault-tolerant System
Outline Announcements Fault Tolerance.
Fault Tolerance Distributed Web-based Systems
Static Testing Static testing refers to testing that takes place without Execution - examining and reviewing it. Dynamic Testing Dynamic testing is what.
20 minutes lecture + 10 min QnA Francis Palma Lakehead University
Chapter 10 – Software Testing
Abstractions for Fault Tolerance
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Software Quality Assurance Fault Tolerant Software Rajiv Krishna Nekkanti Dept. of Computer Science Auburn University Lei Chen Dept. of Computer Science Auburn University

Software Quality Assurance Basic Definitions Error – mistake Fault – result of an error Failure – occurs when a fault executes Dealing with faults : Fault Avoidance Fault Removal Bug Free? Fault Tolerance  our focus

Fault Tolerant Software Software fault tolerance is not a license to ship the system with bugs. The real objective is to improve system performance and availability in cases when the system encounters a software or hardware fault.

Why SFT? In January 1990, AT&T system suffered a nine-hour United-States blockade when one switch experienced abnormal behavior and attempted recovery. Because of a flaw in recovery-recognition software and a network design that permitted propagation of the effects, the problem spread to all switches. During the Persian Gulf War, clock drift in the Patriot system caused it to miss a scud missile that hit an American barracks in Dhahran. The missile hit killed 29 people and injured 97 others. The clock drift was reportedly caused by the software’s use of two different and unequal representations (24-bit and 48 bit) of the value 0.1.

SFT - Key Concepts Error Recovery Redundancy

Error Recovery Error Detection Error Diagnosis Error Containment/isolation Error Recovery

Redundancy Provides additional capabilities and resources needed to detect and tolerate faults Several forms Hardware Software Information/Data Time (Temporal Redundancy)

Redundancy Includes additional programs, modules, functions or objects To tolerate faults arising from specification and design errors or implementation (coding) mistakes Cannot be detected by simple replication of identical software units Have to introduce diversity into software replicas – DIVERSITY

Diversity Design Diversity Data Diversity Temporal Diversity

Design Diversity production of two or more systems (e.g., software modules) aimed at delivering the same service through independent designs and realizations. The systems, produced through the design diversity approach from a common service specification, are called variants. Incorporating two or more variants of a system, tolerance to design faults necessitates an adjudicator, which is based on some previously defined decision strategy and is aimed at providing (what was assumed to be) an error-free result from the outcomes of variant execution.

Design Diversity (cont.) Techniques Recovery Blocks N-Version Programming Distributed Recovery Blocks N-Self Checking Programming Consensus Recovery Block

Recovery Blocks Consists of an executive, an acceptance test (AT), and alternate try blocks (variants). Will first attempt to ensure the AT by using the primary alternate (or try block). If the primary algorithm’s result does not pass the AT, then n-1 alternates will be attempted until an alternate’s results pass the AT. If no alternates are successful, an error occurs.

Recovery Blocks (cont.) Example: Sorting of Numbers

N-Version Programming Consists of an executive, n variants (versions), and a Decision Mechanism (DM). Uses at least two independently designed, functionally equivalent versions (variants) of a program developed from the same specification. run Version 1, Version 2, …, Version n If (Decision Mechanism (Result 1, Result 2, …, Result n)) return Result else failure exception “Static Technique” because various programs will perform the task, regardless of which result (s) was determined acceptable by the DM

N-Version Programming (cont.) Example: Sorting of Numbers

Diversity Design Diversity Data Diversity Temporal Diversity

Data Diversity Use the principle of redundancy (not simple like-copies) and diversifying input data to detect and tolerate software faults. Done using DATA RE-EXPRESSION. Data Re-expression Algorithm (DRA) produces different representations of a module’s input data.

Data Diverse SFT Techniques Retry Blocks N-Copy Programming

Retry Blocks (RtB) The RtB technique uses acceptance tests(AT) to accomplish fault tolerance. AT : acceptance tests DRA : data re-expression algorithm

Retry Blocks Structure and Operation Similar to Recovery Blocks Difference from Recovery Block: the DRA

Retry Blocks Example:

N-Copy Programming (NCP) NCP is the data diverse complement of N-version programming (NVP). It uses a decision mechanism (DM) to accomplish fault tolerance. DM : decision mechanism DRA : data re-expression algorithm

N-Copy Programming N copies of a program execute in parallel Each on a different set of re-expressed data

N-Copy Programming Example:

Diversity Design Diversity Data Diversity Temporal Diversity

Temporal Diversity Temporal diversity involves the performance or occurrence of an event at different times. It can be implemented by beginning software execution at different times or using inputs that are produced or read at different times. It can be an effective means of overcoming transient faults because the temporary conditions that cause problems in one execution may be absent when the software is re-executed.

Adjudicators Adjudicators decide which result to choose as an output. The output should not only be the correct one but also the best one. So more than one type of adjudicator can be used . Voters ATs : acceptance tests

Adjudicating

Voters

Exact majority voter

Acceptance Tests

Summary These days more and more people depend daily on services provided by computer control systems. These computers control ordinary systems such as automobiles, elevators, aircraft, banking systems, power plants, and so on. Should these computers fail, the consequences could be disastrous, such as severe economic losses or even the loss of human lives. Since design faults cannot be totally eradicated from such control systems, they will have to be tolerated during operation without the loss of service.

References Software Fault Tolerance , Michael R. Lyu ,1995, John Wiley & Sons, ISBN 0-471-95068-8 Software Fault Tolerance-Techniques and Implementation, Laura L. Pullum, 2001, Artech House, ISBN 1-58053-137-7 Fault Tolerant Software Systems: Techniques and Applications, Edited by Hoang Pham, 1992, IEEE Computer Society Press Software Fault-Tolerance Techniques from a Real-Time Systems Point of View, Technical Report No. 98-16, Martin Hiller Department of Computer Engineering Chalmers University of Technology SE-412 96 Göteborg Sweden November 1998 Software Fault Tolerance http://www.eventhelix.com/RealtimeMantra/SoftwareFaultTolerance.htm Definitions-Software Fault Tolerance, http://www.cigitallabs.com/resources/definitions/fault_tolerance.html

Questions & Comments?