SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.

Slides:



Advertisements
Similar presentations
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Advertisements

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 2.
Fault-Tolerant Systems Design Part 1.
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Making Services Fault Tolerant
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
Software Testing Using Model Program DESIGN BY HONG NGUYEN & SHAH RAZA Dec 05, 2005.
1 Availability Study of Dynamic Voting Algorithms Kyle Ingols and Idit Keidar MIT Lab for Computer Science.
Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
8. Fault Tolerance in Software 8.1 Introduction Is it true that a program that has once performed a given task as specified will continue to do so? Yes,
1 Chapter Fault Tolerant Design of Digital Systems.
DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
ABCSG - Dependable Systems - 01/06/ ABCSG Dependable Systems.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Constructing Reliable Software Components Across the ORB M. Robert Rwebangira Howard University Future Aerospace Science and Technology.
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006.
SENG521 (Fall SENG 521 Software Reliability & Testing Defining Necessary Reliability (Part 3b) Department of Electrical & Computer.
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Design of SCS Architecture, Control and Fault Handling.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Fundamentals of Python: From First Programs Through Data Structures
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
System/Software Testing
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Critical systems development. Objectives l To explain how fault tolerance and fault avoidance contribute to the development of dependable systems l To.
CS, AUHenrik Bærbak Christensen1 Fault Tolerant Architectures Lyu Chapter 14 Sommerville Chapter 20 Part II.
 Chapter 13 – Dependability Engineering 1 Chapter 12 Dependability and Security Specification 1.
Fault-Tolerant Systems Design Part 1.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
CprE 458/558: Real-Time Systems
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Fault-Tolerant Systems Design Part 1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
NASA Software Assurance Symposium 2001 Metrics for Fault-Tolerant Real-Time Software Afzel Noore Computer Science and Electrical Engineering West Virginia.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Condition Testing. Condition testing is a test case design method that exercises the logical conditions contained in a program module. A simple condition.
CSE 8377 Software Fault Tolerance. CSE 8377 Motivation Software is becoming central to many life- critical systems Software is created by error-prone.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Structuring Redundancy for Fault Tolerance Chapter 2 Designed by: Hadi Salimi Instructor: Dr. Mohsen Sharifi.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Week#2 Software Quality Assurance Software Quality Engineering.
SENG521 (Fall SENG 521 Software Reliability & Testing Preparing for Test (Part 6a) Department of Electrical & Computer Engineering,
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ
Week#3 Software Quality Engineering.
Software Quality Assurance
Reading This lecture: B. Littlewood, P. Popov, L. Strigini, "Modelling software design diversity - a review", ACM Computing Surveys, Vol. 33, No. 2, June.
Software Testing.
Algorithm Analysis CSE 2011 Winter September 2018.
Fault Tolerance In Operating System
Critical systems development
Fault Injection: A Method for Validating Fault-tolerant System
Chapter 10 – Software Testing
Model-based Adaptation for Self-Healing Systems David Garlan, Bradley Schmert ELSEVIER Sciences of Computer Programming 57 (2005) 이경렬
Presentation transcript:

SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical & Computer Engineering, University of Calgary B.H. Far ( )

SENG521 (Fall Fault Tolerance (Review) A fault-tolerant computing system must be capable of providing specified services in the presence of a bounded number of failures. These failures could occur because of faults present in either the components of the system or in the system’s design. Most of the software faults are due to deficiencies of design and almost all of the hardware fault tolerance techniques cannot be applied in software.

SENG521 (Fall Acceptance Testing A program-specific error detection mechanism to check on the results of program execution. Usually evaluates to either “true” or “false”. ensure by P 0 else-by P 1 else fail Examples: Checksums for program parts Internal check points: ABS[(SQRT(x)*SQRT(x)) – x] < E

SENG521 (Fall External Consistency A kind of external error detection mechanism to judge correctness of execution of a program. Examples: Exception signal when dividing by zero Integer overflow signal Interrupt signal for program loop Float point numerical failure check

SENG521 (Fall Example The correct answer is But ordinary implementation of this will return zero due to rounding and large differences in the order of magnitude of the summands.

SENG521 (Fall Redundancy Dual software technique: Implementing two (or more) distinct versions of the same software and executing them for the same set of inputs. Any discrepancy in the outputs of the two versions may trigger an alarm. Redundancy techniques’ efficiency depends on coincident, correlated and dependent faults.

SENG521 (Fall Coincident Faults Coincident Faults: Coincident Faults: when two or more functionally equivalent software components fail on the same input. When two or more software versions give the same incorrect response, an identical-and- wrong (IAW) answer is obtained.

SENG521 (Fall Correlated & Dependent Faults Correlated Faults: Correlated Faults: Two faults are correlated when the measured probability of the coincidence failures is significantly higher than what would be expected from coincidence. If There will be no failure independence.

SENG521 (Fall Possible Failure Scenario What if the software components produce doublet or triplet IAW responses? P1 P2 P3 Doublet & triplet IAW faults Adjudication Algorithm

SENG521 (Fall Adjudication by Voting A “voter” compares results from two or more functionally equivalent software components and decides which of the answers provided by those components is correct. Various versions of voting algorithm: Majority voting 2-of-N voting Consensus voting

SENG521 (Fall Techniques Recovery blocks N-version programming Consensus recovery block Acceptance voting N self-checking programming

SENG521 (Fall Recovery Blocks (RB) Using multiple versions of software module and acceptance test. The output of the 1 st module is tested for acceptability and if fails, the 2 nd module is executed after backward state recovery.

SENG521 (Fall N-Version Programming Parallel execution of N independently developed functionally equivalent modules. Adjudication is via voting. The voter accepts all N outputs and selects the correct one among them. Advantage of NVP: no service interrupt.

SENG521 (Fall Consensus Recovery Block Composed of NVP and RB. IF NVP fails, the system reverts to RB using the same blocks. NVP RB input failure System failure Correct output Correct output success

SENG521 (Fall Acceptance Voting Like NVP all versions are executed in parallel. The output of ach module goes to an acceptance test. If acceptance test is successful, the output goes to a voter.

SENG521 (Fall N Self-Check Programming In N Self-Check Programming (NSCP), N modules are executed in pairs. The pairs’ outputs can be compared or accessed for correctness.