Design of Distributed Real-Time Systems Ramani Arunachalam.

Slides:



Advertisements
Similar presentations
Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Advertisements

Time-Triggered Protocol
EE5900 Advanced Embedded System For Smart Infrastructure
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
1 An Approach to Real-Time Support in Ad Hoc Wireless Networks Mark Gleeson Distributed Systems Group Dept.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Is It Time Yet? Wing On Chan. Distributed Systems – Chapter 18 - Scheduling Hermann Kopetz.
Model for Supporting High Integrity and Fault Tolerance Brian Dobbing, Aonix Europe Ltd Chief Technical Consultant.
Making Services Fault Tolerant
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 The Time-Triggered Model of Computation Lior Zimet.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
1 Real-Time and Dependability Concepts Presented by: David Wang Pallavi Priyadarshini.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Software Engineering and Middleware: a Roadmap by Wolfgang Emmerich Ebru Dincel Sahitya Gupta.
Replication Management using the State-Machine Approach Fred B. Schneider Summary and Discussion : Hee Jung Kim and Ying Zhang October 27, 2005.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Real-time systems. CS351 - Software Engineering (AY2004)2 Real-time systems Real-time (RT) Systems RT transaction Controlled Object Computer System Operator.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Figure 1.1 Interaction between applications and the operating system.
Chapter 11 Operating Systems
CprE 458/558: Real-Time Systems
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
EMBEDDED SOFTWARE Team victorious Team Victorious.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Lecture 13 Fault Tolerance Networked vs. Distributed Operating Systems.
Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
1 Albert Ferrer-Florit, Steve Parkes Space Technology Centre University of Dundee QoS for SpaceWire networks SpW-RT prototyping.
9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
SOFTWARE SYSTEMS DEVELOPMENT 4: System Design. Simplified view on software product development process 2 Product Planning System Design Project Planning.
Quality of Service Karrie Karahalios Spring 2007.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
TTP and FlexRay. Time Triggered Protocols Global time by fault tolerant clock synchronisation Exact time point of a certain message is known (determinism)
Quality of System requirements 1 Performance The performance of a Web service and therefore Solution 2 involves the speed that a request can be processed.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Copyright 1999 G.v. Bochmann ELG 7186B ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
Prepared By: Md Rezaul Huda Reza
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
Advantages of Time-Triggered Ethernet
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Distributed Real-Time Systems.
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
Operating System. Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
For a good summary, visit:
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale
Unit - I Real Time Operating System. Content : Operating System Concepts Real-Time Tasks Real-Time Systems Types of Real-Time Tasks Real-Time Operating.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Chapter 5:Architectural Design l Establishing the overall structure of a software.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Krishna Suman Kadiyala Fault Tolerant Systems EE 585 Fall 2006
Storage Virtualization
Real-time Software Design
Operating Systems : Overview
Time-Triggered Architecture
Presentation transcript:

Design of Distributed Real-Time Systems Ramani Arunachalam

Case Study: MARS ● MARS (Maintainable Real-time system) – Distributed, fault-tolerant, hard real-time – Objectives ● Guaranteed timeliness ● Testability ● Maintainability ● Fault-tolerance ● Systematic software development – Time-triggered architecture

Objectives ● Guaranteed timeliness – Based on resource adequacy at peak load – Statistical assurances not enough ● Testability – Architecture should support testability of timeliness ● Maintainability – Needed to remedy hardware faults, design errors and respond to change requests – Localized consequences -> minimized effort

Objectives ● Fault Tolerance – Redundancy – On-line maintenance ● Systematic software development – No 'trial and error' integration – OS guarantees predictable temporal behaviour

State View ● Time Triggered observation of states – Observe RT entities at predefined intervals ● Intelligent input output – Observation grid – Intelligent sensor ● Preprocesses raw data from input device ● observes at finer granularity called Perception granularity

State View ● Intelligent actuator – Post-processes data from computer system before sending to output device ● State Messages – Produced at observation points – Minimal synchronization requirement – No need for buffer management – Unidirectional (from RT entity)

Structure ● Clusters – Autonomous subsystems – Disjoint name spaces – State message exchanges – Composed of Fault-tolerant units (FTUs) – Real-time communication channel (TDMA) ● FTU – Composed of replicated components – Active and shadow components

FTU

Structure ● Component – Smallest replaceable unit – Fail-silent (Correct results or none) – Termination upon failure ● Task Execution – Task : Software inside component – Starts at predefined time – Proceeds without any communication or synchronization – Execution time is deterministic

Operation ● Results of periodic tasks sent as state messages ● Execution time of communication is also predefined ● A Real-time transaction is a progression of processing and communication actions between a stimulus from and a response to the environment. ● Static scheduling (at compile time!) ● At run-time, no surprises ● Modes (operating, emergency)

Fault-tolerance ● Two levels of redundancy ● Active redundancy at FTU level – If a component fails, standby becomes active ● Time redundancy at component level – Every task is executed twice and results compared ● TDMA monitor – Monitors temporal behaviour – Controls the output from component ● Distributed clock synchronization

Fault-tolerance ● Replica determinism – All replicated components perform the same state changes at the same point in time – Prohibit reading of local time – All replicas should agree when to change mode ● Component reintegration – i-state, h-state – Reintegration point: when size of h-state is small – New component gets the h-state at this point

Summary ● Maintenance – Failed component doesn't affect FTU – On-line reintegration after repair – Change in software ● Does it fit in current schedule? ● Otherwise, new mode with new schedule ● Summary – Strict separation of functionality, timeliness and dependability. – Designed for temporal behaviour, testing simplified.

Delta-4 XPA ● Objectives – “A real-time system is not assured to meet deadlines outside operational envelope” – Bounded-demand school ● operational envelope is predictable ● Impractical assumption for complex systems – Unbounded-demand school ● Complete definition of operational envelope is not possible ● Graceful degradation if it falls outside the envelope – XPA implements hard real-time but falls into best- effort behaviour when required.

DELTASE Group management Layer Time and Group communication Abstract network layer (physical + MAC+ firmware)

Architecture ● Network infrastructure – FDDI supports urgent traffic, built-in fault tolerance – Token bus/ring has media redundancy for availability ● Time – Internal time maintained by distributed time server – Clocks synchronized to tens of microseconds – External time – one of the standard time ● Group communication – Services from atomic multicast to datagram – Very fast services of varying reliability

Architecture ● Group communication – Distributed replication management ● BestEffortN – guarantee delivery to N elements ● BestEffortTo - guarantee delivery to named elements ● AtLeastN, atLeastTo – guaranteed service even when sender fails ● Group management – Distributed Group manager object – Management and distribution of groups of objects – Incorporates knowledge of various modes of replication

Architecture ● Application support environment (Deltase) – Client-server and producer-consumer interactions – Apps written using deltase or converted using preprocessors ● Timeliness – What to do under overload conditions? ● Static off-line scheduling – too many possibilities ● On-line scheduling – can find feasible schedules if not overload.

Timeliness ● Scheduling policy uses “precedence” – Combination of priority and earliest-deadline – Few priority classes to avoid unfairness – Within priority class, earliest-deadline-first. ● Design-time and run-time timeliness – Targetline : instant chosen by designer for provision of service – Liveline and deadline: earliest and latest time at which service may be provided – Violation of these detected at runtime and design-time actions defined.

Preemption ● Leader-follower model for replication – Decisions made by a privileged replica i.e. Leader – Preemption point ● Point at which an interrupt will be served – High precedence msg arrives for a process not running currently ● Increase the process's precedence to that of msg ● Causes the process to be scheduled ● These actions propogated to followers ● Followers perform identical operations

Desynchronization ● Followers must not be too apart from leaders ● Followers too fast – Reach the preemption point before leader – remain blocked until leader notifies ● Followers too slow – Leader timestamps notifications – If follower didn't execute the action by T+t(desync) ● Desynchonization event raised ● Another follower takes over

Summary ● Communication support using groups – Oriented to distributed computing ● Tradeoffs between QOS and efficiency – Group mgr uses atomic multicast for orderly delivery – Leader-follower uses reliable, non-ordered delivery ● Group management service – Executes leader-follower, detects replica failure – Clone the replica at another node.