Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat.

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

EE5900 Advanced Embedded System For Smart Infrastructure
1 Fault-Tolerant Computing Systems #6 Network Reliability Pattara Leelaprute Computer Engineering Department Kasetsart University
11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.
THE UNIVERSITY of TEHRAN Mitra Nasri Sanjoy Baruah Gerhard Fohler Mehdi Kargahi October 2014.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Making Services Fault Tolerant
1 Building Reliable Web Services: Methodology, Composition, Modeling and Experiment Pat. P. W. Chan Department of Computer Science and Engineering The.
Page 1 Building Reliable Component-based Systems Chapter 13 -Components in Real-Time Systems Chapter 13 Components in Real-Time Systems.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Bogdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng Department of Computer and Information Science, Linkoping University, Sweden December 3, 2010.
Reliability-Aware Frame Packing for the Static Segment of FlexRay Bogdan Tanasa, Unmesh Bordoloi, Petru Eles, Zebo Peng Linkoping University, Sweden 1.
Reliability on Web Services Pat Chan 31 Oct 2006.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.
Enhancing the Platform Independence of the Real-Time Specification for Java Andy Wellings, Yang Chang and Tom Richardson University of York.
CprE 458/558: Real-Time Systems
Misconceptions About Real-time Computing : A Serious Problem for Next-generation Systems J. A. Stankovic, Misconceptions about Real-Time Computing: A Serious.
CprE 458/558: Real-Time Systems
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,
The Design and Performance of A Real-Time CORBA Scheduling Service Christopher Gill, David Levine, Douglas Schmidt.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Slide 6.1 CHAPTER 6 TESTING. Slide 6.2 Overview l Quality issues l Nonexecution-based testing l Execution-based testing l What should be tested? l Testing.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Preemption Control.
Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,
Cluster Reliability Project ISIS Vanderbilt University.
An efficient active replication scheme that tolerate failures in distributed embedded real-time systems Alain Girault, Hamoudi Kalla and Yves Sorel Pop.
Probabilistic Preemption Control using Frequency Scaling for Sporadic Real-time Tasks Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Quantifying the Sub-optimality of Non-preemptive Real-time Scheduling Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Real-Time Systems Hierarchical Real-Time Systems for Imprecise Computation Model The 5th EuroSys Doctoral Workshop (EuroDW 2011) Guy Martin.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Towards Preemption.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
The Global Limited Preemptive Earliest Deadline First Feasibility of Sporadic Real-time Tasks Abhilash Thekkilakattil, Sanjoy Baruah, Radu Dobrin and Sasikumar.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Introduction to Real-Time Systems.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
Optimal Resource Allocation for Protecting System Availability against Random Cyber Attack International Conference Computer Research and Development(ICCRD),
Chapter 13: Software Quality Project Management Afnan Albahli.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Resource Augmentation for Performance Guarantees in Embedded Real-time Systems Abhilash Thekkilakattil Licentiate Thesis Presentation Västerås, November.
Fault Tolerant Scheduling of Mixed Criticality Real-Time Tasks under Error Bursts Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
Real-Time Scheduling II: Compositional Scheduling Framework Insik Shin Dept. of Computer Science KAIST.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Introduction to Real-Time Systems
A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults Ching-Chih Han, Kang G. Shin, and Jian Wu.
1 Developing Aerospace Applications with a Reliable Web Services Paradigm Pat. P. W. Chan and Michael R. Lyu Department of Computer Science and Engineering.
Resource Augmentation for Fault-Tolerance Feasibility of Real-time Tasks under Error Bursts Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat and.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
1. Objectives Novartis is developing a new triple fixed-dose combination product. As part of the clinical pharmacology program, pharmacokinetic (PK) drug-drug.
Fault-Tolerant Rate- Monotonic Scheduling Sunondo Ghosh, Rami Melhem, Daniel Mosse and Joydeep Sen Sarma.
Reliable energy management System reliability is affected by use of energy management The use of DVS increases the probability of faults, thus damaging.
University of Maryland College Park
Critical systems design
Prabhat Kumar Saraswat Paul Pop Jan Madsen
Non-additive Security Games
Software Reliability Definition: The probability of failure-free operation of the software for a specified period of time in a specified environment.
Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling
ElasticTree Michael Fruchtman.
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat Mälardalen Real-Time Research Centre, Mälardalen University, Sweden

Introduction Complexity of real-time systems Component Based Software engineering Reliability Requirements Contracts for real-time components Enable correct composition of components Ensure correctness by construction Pervasiveness of real-time Systems

Improving Reliability of Real-time Systems Zonal and functional hazard analyses Checks if the redundancies indeed exist Ensures that independent components are not affected by common causes Provides input to the design e.g., separation and segregation of components Zonal analysis for software systems Improves the reliability of software components Removes failures on independent components due to common causes Inputs to the design e.g., allocation Transient errors: most widespread cause of failure Solution: re-execute the failed component Taken from toonpool.com

Problem Allocation and scheduling of real-time components on a distributed platform Satisfy the re-execution requirements of critical components Satisfy the distribution requirements of critical components Maximize service to the non-critical components Fulfill real-time requirements ComponentTime Period (T i ) Worst Case Execution Time (C i ) Re-executions required ( R i ) No. of re-executions required on a different node (m i ) Criticality A10221C B5211C C5100N D 600N

Task allocation problem: an NP hard problem We use known optimization methods: achieve efficient allocation Satisfying the reliability requirements: an NP hard problem We simplify by introducing Feasibility Windows Feasibility Windows: temporal intervals for task executions Fault Tolerant Feasibility Windows for critical components Fault Aware Feasibility Windows for non-critical components Contracts for fault-tolerance Contract: task parameters which provide the required guarantees Offline contracts: offline guarantees for critical components Online contracts: maximize service to non-critical components Overview of the Solution

Method Allocate the components on the minimum number of processors Derive Fault Tolerant Feasibility Windows for critical components Derive Fault Aware Feasibility Windows for non critical components Derive contractual parameters to ensure that the executions are within the derived windows

Minimum size of a window of a component=WCET of the component Guarantees feasible execution of the component Feasibility windows of the same component are disjoint in time Ensure timely execution in order to enable the re-execution To preserve the order of execution of the component and its re- executions While allocation the processor utilization demand during any interval should not exceed the size of the interval to avoid overloads New method to deal with offsets Derived from the classical feasibility analysis by Baruah et. al Optimization Formulation

Example A2BB B1 A1 DDBB AB1 A1 A CC CC FT_FW(A2) FT_FW(B) FT_FW(A)FT_FW(A1) FT_FW(B1) FA_FW(C) Worst Case Maximum fault occurrence Node Node Better than Worst Case Less than maximum fault occurrence Node1 Node2 DD ComponentTime Period (T i ) Worst Case Execution Time (C i ) Re-executions required ( R i ) No. of re-executions required on a different node (m i ) Criticality A10221C B5211C C5100N D 600N

Conclusions We have proposed a methodology for the allocation and scheduling of components with mixed criticalities which: Guarantees the re-execution requirements for the critical components: offline contracts Maximize the service to non-critical components: online contracts Scheduler independent Allocation on the minimum number of processors Future work includes Feasibility of real-time components with offsets: complexity reduction Optimality

?