Langley Research Center Why is SPIDER Design Assurance based on Formal Methods? Paul S. Miner NASA Langley Internal Formal Methods.

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

Bus Architectures for Satety- Critical Embedded Systems --by Harit Desai.
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
Aviation Safety ProgramSingle Aircraft Accident Prevention April NCC-1-377, Honeywell Tucson Design, Implementation, and Verification of Fault-Tolerant.
11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 24 Slide 1 Critical Systems Validation 2.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 24 Slide 1 Critical Systems Validation.
Self-Stabilization in Distributed Systems Barath Raghavan Vikas Motwani Debashis Panigrahi.
Dagstuhl Intro Mike Whalen. 2 Mike Whalen My main goal is to reduce software verification and validation (V&V) cost and increasing.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
CS599 Software Engineering for Embedded Systems1 Software Engineering for Real-Time: A Roadmap Presentation by: Mandar Samant Raghbir Singh Banwait.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
WPDRTS ’05 1 Workshop on Parallel and Distributed Real-Time Systems 2005 April 4th and 5th, 2005, Denver, Colorado Challenge Problem Session Detection.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Strategic Directions in Real- Time & Embedded Systems Aatash Patel 18 th September, 2001.
CprE 458/558: Real-Time Systems
(c) 2007 Mauro Pezzè & Michal Young Ch 3, slide 1 Basic Principles.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Motivation  Synthesis-based methodology for quick design space exploration enabled by automatic synthesis followed by analysis  Automatic synthesis:
Testing safety-critical software systems
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Copyright Critical Software S.A All Rights Reserved. VAL-COTS Validation of Real Time COTS Products Ricardo Barbosa, Henrique Madeira, Nuno.
Software Reliability Categorising and specifying the reliability of software systems.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 24 Slide 1 Critical Systems Validation 1.
Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies.
Langley Research Center SPIDER Formal Models–Where are we now? Paul S. Miner In collaboration with: Alfons Geser (NIA), Jeff Maddalon,
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
Failure Spread in Redundant UMTS Core Network n Author: Tuomas Erke, Helsinki University of Technology n Supervisor: Timo Korhonen, Professor of Telecommunication.
1 Operating System Overview Chapter 2 Advanced Operating System.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
Software Engineering DKT 311 Lecture 11 Verification and critical system validation.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
Protecting the Public, Astronauts and Pilots, the NASA Workforce, and High-Value Equipment and Property Mission Success Starts With Safety Believe it or.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Copyright John C. Knight SOFTWARE ENGINEERING FOR DEPENDABLE SYSTEMS John C. Knight Department of Computer Science University of Virginia.
Part.1.1 In The Name of GOD Welcome to Babol (Nooshirvani) University of Technology Electrical & Computer Engineering Department.
Fault-Tolerant Systems Design Part 1.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 2 (Part II) Operating System Overview.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
CprE 545Iowa State University CprE 558: Real-Time Systems Lectures 15-16: Dependability Concepts & Faul-Tolerance.
1 Reducing the Software Impact to System Safety Paul Mayo – SafeEng Limited.
Fault-Tolerant Parallel and Distributed Computing for Software Engineering Undergraduates Ali Ebnenasir and Jean Mayo {aebnenas, Department.
ATLAS Grid Data Processing: system evolution and scalability D Golubkov, B Kersevan, A Klimentov, A Minaenko, P Nevski, A Vaniachine and R Walker for the.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
CS 505: Thu D. Nguyen Rutgers University, Spring CS 505: Computer Structures Fault Tolerance Thu D. Nguyen Spring 2005 Computer Science Rutgers.
The Interconnect Modeling Company™ High-Speed Interconnect Measurements and Modeling Dima Smolyansky TDA Systems, Inc.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
Verification of FT System Using Simulation Petr Grillinger.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Modeling Issues for Validation, Verification, and Certification (VV&C) Paul Miner NASA Langley Research Center 22 September 2015.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
Real-Time Systems, Events, Triggers. Real-Time Systems A system that has operational deadlines from event to system response A system whose correctness.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
1 The Formal Verification of SPIDER Lee Pike Department of Computer Science Indiana University, Bloomington
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Langley Research Center An Architectural Concept for Intrusion Tolerance in Air Traffic Networks Jeffrey Maddalon Paul Miner {jeffrey.m.maddalon,
ARTEMIS SRA 2016 Trust, Security, Robustness, and Dependability Dr. Daniel Watzenig ARTEMIS Spring Event, Vienna April 13, 2016.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Operating System Reliability
Operating System Reliability
Critical Systems Validation
Operating System Reliability
Operating System Reliability
Software Engineering for Safety: a Roadmap
Operating System Reliability
Operating System Reliability
Presentation transcript:

Langley Research Center Why is SPIDER Design Assurance based on Formal Methods? Paul S. Miner NASA Langley Internal Formal Methods Workshop Wednesday, October 22, 2003

Langley Research Center October 22, 2003FM for SPIDER2 Short Answer One goal of the SPIDER project was to demonstrate the use of Formal Methods in support of certification for complex digital systems Which leads to the real question: Why should we use formal methods as part of our design assurance for safety-critical systems?

Langley Research Center October 22, 2003FM for SPIDER3 Longer Answer Claim: A mature engineering discipline is characterized by principled use of analytical models –Test is used to determine adequacy of analysis –Modern engineering is rooted in Culmann’s Graphic Statics (1865) Claim: Computer engineering is an immature discipline –Design assurance based on testing and process assurance Emphasis on build-and-test is analogous to the state of engineering prior to Culmann Analysis used to determine adequacy of test (e.g. test coverage analysis)

Langley Research Center October 22, 2003FM for SPIDER4 Longer Answer (2) Claim: Increasing logical complexity of digital avionics systems increases risk of catastrophic failure –Current design and test paradigm is insufficient –Need better engineering mathematics for analysis of digital systems Claim: Formal Methods is the engineering mathematics for digital system design (both hardware and software)

Langley Research Center October 22, 2003FM for SPIDER5 Required Knowledge Mature engineering disciplines require familiarity with a variety of mathematical models –For example, Statics and Dynamics, Maxwell’s equations What do we expect of a computer engineer? –Familiarity with C

Langley Research Center October 22, 2003FM for SPIDER6 What about SPIDER? Why is current practice insufficient for SPIDER design assurance?

Langley Research Center October 22, 2003FM for SPIDER7 Lessons from History In Design Paradigms, Petroski argues that we should learn from the patterns of past engineering failures –Many failures are rooted in unfounded extrapolations from earlier successful designs We are asking for trouble, if we continue to rely on process assurance and test –We need better analysis techniques

Langley Research Center October 22, 2003FM for SPIDER8 What is SPIDER? A family of fault-tolerant Integrated Modular Avionics (FT-IMA) architectures PE & BIU 1 PE & BIU 2 PE & BIU 3RMU 3 RMU 2 RMU 1

Langley Research Center October 22, 2003FM for SPIDER9 Fault-Tolerant Integrated Modular Avionics Avionics evolving from several special-purpose on-board computational platforms (each supporting a few aircraft functions) to a few computational platforms (each supporting many aircraft functions) Trend towards integrated modular avionics requires that we revisit fault-tolerance (FT) strategies –IMA supports many functions of mixed criticality –Design must enforce partitioning between functions of different criticality Even when physical faults have occurred –Increasing dependence on COTS devices which are increasingly less reliable

Langley Research Center October 22, 2003FM for SPIDER10 Fault-Tolerant Integrated Modular Avionics(2) Loss of FT for some application might impact all critical functions –System restart is not an attractive option When necessary, should be fast and (perhaps) automatic –Must protect against cascading failures Shared resources make it possible to dynamically re-allocate computational/communication resources from less critical functions We are placing increased reliance on core computation resources, while simultaneously pushing the design envelope

Langley Research Center October 22, 2003FM for SPIDER11 SPIDER Design Objectives FT-IMA Architecture proven to survive a bounded number of physical faults –Both permanent and transient –Must survive Byzantine faults Capability to survive or quickly recover from massive correlated transient failure (e.g. in response to HIRF)

Langley Research Center October 22, 2003FM for SPIDER12 Byzantine Faults Characterized by asymmetric error manifestations –different manifestations to different fault-free observers –including dissimilar values Can cause redundant computations to diverge –Triplex Clock Synchronization DemoTriplex Clock Synchronization Demo If not properly handled, single Byzantine fault can defeat several layers of redundancy Many architectures neglect this class of fault –Assumed to be rare or even impossible Hard to simulate, harder to test

Langley Research Center October 22, 2003FM for SPIDER13 Byzantine faults are real Several examples cited in Byzantine Faults: From Theory to Reality, Driscoll, et al. (SAFECOMP 2003) –Byzantine failures nearly grounded a large fleet of aircraft –Quad-redundant system failed in response to a single fault –Typical cases are faulty transmitters (resulting in indeterminate voltage levels at receivers) or faults that cause timing violations (so that multiple observers perceive the same event differently) H eavy Ion fault-injection results for TTP/C (Sivencrona, et al.) –more than 1 in 1000 of observed errors had Byzantine manifestations Many current architectures do not explicitly address this failure mode

Langley Research Center October 22, 2003FM for SPIDER14 SPIDER Advantages Fault-Tolerance independent of applications Tolerates more failures –including any single Byzantine fault (and some combinations) –including many combinations of less severe failures –Hybrid fault model: good, asymmetric, symmetric, benign Does not require that nodes fail silent –But can take advantage when they do Simpler, stronger protocols with stronger assurance Can gracefully evolve to accommodate parts obsolescence –Off-the-shelf processors and low-level communication

Langley Research Center October 22, 2003FM for SPIDER15 Strength of Formal Verification Proofs equivalent to testing the protocols –for all specified configurations –for all possible combinations of faults that satisfy the maximum fault assumption for each specified configuration –for all specified message values The formal proofs provides verification coverage equivalent to an infinite number of test cases –Provided that the model of the protocols is faithful to the VHDL design and physical implementation

Langley Research Center October 22, 2003FM for SPIDER16 Summary Trend to FT-IMA is pushing design envelope, while simultaneously placing increased reliance on core resources –History suggests an increased risk of engineering failure SPIDER project is developing formal analytical models that will allow greater design assurance for FT-IMA systems