18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ 2005701688.

Slides:



Advertisements
Similar presentations
Lock-Based Concurrency Control
Advertisements

Making Services Fault Tolerant
1 Building Reliable Web Services: Methodology, Composition, Modeling and Experiment Pat. P. W. Chan Department of Computer Science and Engineering The.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Dependability TSW 10 Anders P. Ravn Aalborg University November 2009.
Software Testing Using Model Program DESIGN BY HONG NGUYEN & SHAH RAZA Dec 05, 2005.
Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
8. Fault Tolerance in Software 8.1 Introduction Is it true that a program that has once performed a given task as specified will continue to do so? Yes,
Introduction to Operating Systems – Windows process and thread management In this lecture we will cover Threads and processes in Windows Thread priority.
8. Fault Tolerance in Software
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
ABCSG - Dependable Systems - 01/06/ ABCSG Dependable Systems.
Dependability ITV Real-Time Systems Anders P. Ravn Aalborg University February 2006.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Constructing Reliable Software Components Across the ORB M. Robert Rwebangira Howard University Future Aerospace Science and Technology.
Testing an individual module
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Design of SCS Architecture, Control and Fault Handling.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
1 CS 501 Spring 2003 CS 501: Software Engineering Lecture 16 System Architecture and Design II.
© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.
CS, AUHenrik Bærbak Christensen1 Fault Tolerant Architectures Lyu Chapter 14 Sommerville Chapter 20 Part II.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
Fault-Tolerant Systems Design Part 1.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
1 Reliable Web Services by Fault Tolerant Techniques: Methodology, Experiment, Modeling and Evaluation Term Presentation Presented by Pat Chan 3 May 2006.
CprE 458/558: Real-Time Systems
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Software Testing Definition Software Testing Module ( ) Dr. Samer Odeh Hanna.
Fault-Tolerant Systems Design Part 1.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
CS212: Object Oriented Analysis and Design Lecture 19: Exception Handling.
NASA Software Assurance Symposium 2001 Metrics for Fault-Tolerant Real-Time Software Afzel Noore Computer Science and Electrical Engineering West Virginia.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.
CSE 8377 Software Fault Tolerance. CSE 8377 Motivation Software is becoming central to many life- critical systems Software is created by error-prone.
1 Developing Aerospace Applications with a Reliable Web Services Paradigm Pat. P. W. Chan and Michael R. Lyu Department of Computer Science and Engineering.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Mutation Testing Breaking the application to test it.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Structuring Redundancy for Fault Tolerance Chapter 2 Designed by: Hadi Salimi Instructor: Dr. Mohsen Sharifi.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Week#2 Software Quality Assurance Software Quality Engineering.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
Week#3 Software Quality Engineering.
Software Quality Assurance
Reading This lecture: B. Littlewood, P. Popov, L. Strigini, "Modelling software design diversity - a review", ACM Computing Surveys, Vol. 33, No. 2, June.
Fault-Tolerant Computing Systems #3 Fault-Tolerant Software
Fault Tolerance In Operating System
Multi-version approach (with error detection and recovery)
Lecture 09:Software Testing
Outline Announcements Fault Tolerance.
Fault Tolerance Distributed Web-based Systems
20 minutes lecture + 10 min QnA Francis Palma Lakehead University
Abstractions for Fault Tolerance
Seminar on Enterprise Software
Presentation transcript:

18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ

18/05/2006 INTRODUCTION The software faults in a real-time system:  Concurency-control faults: These fault involve inter-process communication and syncronization, data coherence and protection, adn deadlock.  Timing: a task is not completed in the specified amount of time  Error-detection and error-recovery: These faults occur when the detection and recovery mechanism could not handle an error or invoked when no error exists.

18/05/2006 INTRODUCTION Software fault tolerance is techniques:  are designed to allow a system to tolerate software faults that remain in the system after its development  provide mechanisms to the software system to prevent system failure from occurring  have been used mostly in the aerospace, nuclear power, healthcare, telecommunications and ground transportation industries whose faults can be catastrophic.  In this term paper, I will discuss the fault tolerance techniques based on design and data diversity.

18/05/2006 SOFTWARE FAULT TOLERANT TECHNIQUES: DATA and DESIGN DIVERSITY Multiple data representation enviroment:  Data diverse techniques are used in a multiple data representation environment  utilize different representations of input data to provide tolerance to software design faults Multiple version software enviroment:  Design diverse techniques are used in a multiple version software environment  use the functionally of independently developed software versions to provide tolerance to software design faults

18/05/2006 Design Diversity Techniques Two or more variants of software developed by different teams but to a common specification are used. These variants are then used in a time or space redundant manner to achieve fault tolerance. Disadvantages of design diversity is the high cost involved in developing multiple variants of software

18/05/2006 Design Diversity Techniques Popular techniques which are based on the design diversity concept for fault tolerance in software are:  Recovery Block  N-Version Programming  N-Self-Checking Programming

18/05/2006 Design Diversity Techniques: Recovery Block (RcB) It was introduced in 1974 by Horning, with early implementations developed by Randell in 1975 and Hecht in 1981 Its selection is made during program execution based on the result of the acceptance test (AT) The basic RcB scheme consists of an executive, an acceptance test, and primary and alternate try blocks (variants) Many implementations of RcB, especially for real-time applications, include a watchdog timer The RcB is categorized as a dynamic technique

18/05/2006 Design Diversity Techniques: Recovery Block (RcB) This figure illustrates the structure and operation of the basic RcB technique with a watchdog timer. The RcB figure states that the technique will first attempt to ensure the AT by using the primary alternate If the primary algorithm’s result does not pass the AT, then n-1 alternates will be attempted until an alternate’s results pass the AT. If no alternates are successful, an error occurs.

18/05/2006 Design Diversity Techniques: N-Version Programming (NVP) NVP was suggested by Elmendorf in 1972 and developed by Avizienis and Chen in 1977–1978 Compared with RcB, NVP is s a static technique. That means a task:  is executed by several processes or programs and a result is accepted only if it is adjudicated as an acceptable result, usually via a majority vote.

18/05/2006 Design Diversity Techniques: N-Version Programming (NVP) This figure illustrates the structure and operation of the basic NVP technique The NVP technique uses a decision mechanism (DM) and forward recovery to accomplish fault tolerance. The technique uses at least two independently designed, functionally equivalent versions (variants) of a program developed from the same specification. The variants are run in parallel and a DM examines the results and selects the “best” result, if one exists

18/05/2006 Design Diversity Techniques: N-Version Programming (NVP) General syntax: run Version 1, Version 2,..., Version n if (Decision Mechanism (Result1, Result2,...,Result n)) return Result else failure exception The NVP syntax above states that the technique executes the n versions concurrently. The results of these executions are provided to the DM, which operates upon them to determine if a correct result can be adjudicated. If one can, then it is returned. If a correct result cannot be determined, then an error occurs.

18/05/2006 Design Diversity Techniques: N Self-Checking Programming (NSCP) NSCP is a design diverse technique developed by Laprie. The hardware fault tolerance architecture related to NSCP is active dynamic redundancy. It results from either the application of an AT to a variant’s results or from the application of a comparator to the results of two variants.

18/05/2006 Design Diversity Techniques: N Self-Checking Programming (NSCP) This figure illustrates the structure and operation of the basic NSCP technique

18/05/2006 Design Diversity Techniques: N Self-Checking Programming (NSCP) General syntax: run Variants 1 and 2 on Hardware Pair 1,Variants 3 and 4 on Hardware Pair 2 compare Results 1 and 2 compare Results 3 and 4 if not (match) set NoMatch1 set NoMatch2 else set Result Pair 1 else set Result Pair 2 if NoMatch1 and not NoMatch2, Result = Result Pair 2 else if NoMatch2 and not NoMatch1, Result = Result Pair 1 else if NoMatch1 and NoMatch2, raise exception else if not NoMatch1 and not NoMatch2 then compare Result Pair 1 and 2 if not (match), raise exception if (match), Result = Result Pair 1 or 2 return Result The NSCP syntax above states that the technique executes the n variants concurrently, on n/2 hardware pairs. The results of the paired variants are compared. If any pair’s results do not match, a flag is set indicating pair failure. If a single pair failure has occurred, then the nonfailing pair’s results are returned as the NSCP result. If both pairs failed to match, then an exception is raised. If pair results match then the results of the pairs are compared. If they match, then the result is set as one of the matching values and returned as the NSCP result. If the result of the pair matches does not match, then an exception is raised.

18/05/2006 Data Diversity Techniques Data diversity, a technique for fault tolerance in software, was introduced by Amman and Knight. While the design diversity approaches to provide fault tolerance rely on multiple versions of the software written to the same specifications, the data diversity approach uses only one version of the software. This approach relies on the observation that a software sometime fails for certain values in the input space and this failure could be averted if there is a minor perturbation of input data which is acceptable to the software.

18/05/2006 Data Diversity Techniques This technique is cheaper to implement than the design diversity tecghnique. Popular techniques which are based on the data diversity concept for fault tolerance in software are:  Retry Blocks  N-Copy Programming

18/05/2006 Data Diversity Techniques: Retry Blocks (RtB) A retry block is a modification of the recovery block structure that uses data diversity instead of design diversity. Rather than the multiple alternate algorithms used in a recovery block, a retry block use only one algorithm. A retry block's acceptance test has the same form and purpose as a recovery block's acceptance test.

18/05/2006 Data Diversity Techniques: Retry Blocks (RtB) This figure illustrates the structure and operation of the basic RtB technique A retry block executes the single algorithm normally and evaluates the acceptance test. If the acceptance test passes, the retry block is complete. If the acceptance test fails, the algorithm executes again after the data have been reexpressed. The system repeats this process until it violates a deadline or produces a satisfactory output.

18/05/2006 Data Diversity Techniques: Retry Blocks (RtB) General syntax: ensure Acceptance Test by Primary Algorithm (Original Input) else by Primary Algorithm (Re-expressed Input) [Deadline Expires] else by Backup Algorithm (Original Input) else failure exception The RtB syntax above states that the technique will first attempt to ensure the AT by using the primary algorithm. If the primary algorithm’s result does not pass the AT, then the input data will be reexpressed and the same algorithm attempted until a result passes the AT or the WDT deadline expires. If the deadline expire, the backup algorithm is invoked with the original inputs. If this backup algorithm is not successful, an error occurs.

18/05/2006 Data Diversity Techniques: N-Copy Programming (NCP) An N-copy system is similar to an N-version system but uses data diversity instead of design diversity. N copies of a program execute in parallel, each on a set of data produced by reexpression. The system selects the output to be used by an enhanced voting scheme.

18/05/2006 Data Diversity Techniques: N-Copy Programming (NCP) This figure illustrates the structure and operation of the basic NCP technique The NCP technique uses a decision mechanism (DM) and forward recovery to accomplish fault tolerance. The technique uses one or more Data re-expression algorithms(DRAs) and at least two copies of a program. The system inputs are run through the DRA(s) to re-express the inputs. The copies execute in parallel using the re-expressed data as input. A DM examines the results of the copy executions and selects the “best” result, if one exists.

18/05/2006 Data Diversity Techniques: N-Copy Programming (NCP) The basic NCP technique consists of an executive, 1 to n DRA, n copies of the program or function, and a DM. The executive orchestrates the NCP technique operation, which has the general syntax: run DRA 1, DRA 2,..., DRA n run Copy 1(result of DRA 1), Copy 2(result of DRA 2),..., Copy n(result of DRA n) if (Decision Mechanism (Result 1, Result 2,...,Result n)) return Result else failure exception The NCP syntax above states that the technique first runs the DRA concurrently to re-express the input data, then executes the n copies concurrently. The results of the copy executions are provided to the DM, which operates upon the results to determine if a correct result can be adjudicated. If one can (i.e., the Decision Mechanism statement above evaluates to TRUE), then it is returned. If a correct result cannot be determined, then an error occurs.

18/05/2006 Enviroment Diversity Techniques Environment diversity is the newest approach to fault tolerance in software. The environment diversity approach requires reexecuting the software in a different environment. Transient faults typically occur in computer systems due to design faults in software which result in unacceptable and erroneous states in the OS environment. When the software fails, it is restarted in a different, error-free OS environment state which is achieved by some clean up operations.

18/05/2006 CONCLUSION A lot of techniques have been developed for achieving fault tolerance in software. The application of all of these techniques is relatively new to the area of fault tolerance. Furthermore, each technique will need to be tailored to particular applications. This should also be based on the cost of the fault tolerance effort required by the customer. The differences between each technique provide some flexibility of application.

18/05/2006 REFERENCES [1] “Data Diversity: An Approach to Software Fault Tolerance”, R. E. Ammann and J. C. Knight, IEEE Transactions on Computers, April 1988 (Vol. 37, No. 4) pp [2] “Software Fault Tolerance”; Chris Inacio, Carnegie Mellon University b Depandable Embedded Systems, Spring [3] “Design Diversity: an Update from Research on Reliability Modelling”; Peter Popov, Bev Littlewood, Lorenzo Strigini; Safety Critical Symposium 2001(Springer 2001) [4] “Modelling software design diversity: a review”; Littlewood, B., Popov, P., and Strigini, L. (2001); ACM Computing Surveys, 33(2):177—208 [5] “A Survey of Software Fault Tolerance Techniques”; Zaipeng Xie, Hongyu Sun, Kewal Saluja.

18/05/2006 Thank You!! Any Questions?