RTOS Scheduling 2.0 Problems - Solutions

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Resource Access Protocols Peter Marwedel Informatik 12 TU Dortmund Germany 2008/12/06.
Advertisements

Real Time Scheduling.
Chapter 7 - Resource Access Protocols (Critical Sections) Protocols: No Preemptions During Critical Sections Once a job enters a critical section, it cannot.
Priority Inheritance and Priority Ceiling Protocols
Introduction to Embedded Systems Resource Management - III Lecture 19.
Priority Inversion BAE5030 Advanced Embedded Systems 9/13/04.
Real-time Embedded Systems Complex RMS and deadline monotonic scheduling.
Priority INHERITANCE PROTOCOLS
Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 8 SCHEDULING.
1 EE5900 Advanced Embedded System For Smart Infrastructure RMS and EDF Scheduling.
0 Synchronization Problem Resource sharing –Requires mutual exclusion –Critical section A code section that should be executed mutually exclusively by.
CS5270 Lecture 31 Uppaal, and Scheduling, and Resource Access Protocols CS 5270 Lecture 3.
Resource Access Protocols
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Resource Access Control Protocols.
CSE 522 Real-Time Scheduling (3)
Mutual Exclusion.
Secure Operating Systems Lesson 5: Shared Objects.
Interprocess Communication
Deadlock CSCI 444/544 Operating Systems Fall 2008.
© Andy Wellings, 2004 Roadmap  Introduction  Concurrent Programming  Communication and Synchronization  Completing the Java Model  Overview of the.
UCDavis, ecs150 Fall /23/2007ecs150, fall Operating System ecs150 Fall 2007 : Operating System #3: Priority Inversion (a paper on the class.
Introduction to Operating Systems – Windows process and thread management In this lecture we will cover Threads and processes in Windows Thread priority.
UCDavis, ecs150 Spring /21/2006ecs150, spring Operating System ecs150 Spring 2006 : Operating System #3: Priority Inversion (paper) Dr. S.
Resource Access Control (Part I) The Mars Pathfinder Incident Resource Model Priority Inversion.
UCDavis, ecs251 Fall /23/2007ecs251, fall Operating System Models ecs251 Fall 2007 : Operating System Models #3: Priority Inversion Dr. S.
Introduction to Embedded Systems
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
Real Time Scheduling Telvis Calhoun CSc Outline Introduction Real-Time Scheduling Overview Tasks, Jobs and Schedules Rate/Deadline Monotonic Deferrable.
Deadlocks Silberschatz Ch. 7 and Priority Inversion Problems.
15-410, F’ Scheduling on Mars Oct. 29, 2004 Dave Eckhardt Bruce Maggs L22b_Mars “Delayed Impact”
Deadlock Detection and Recovery
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
1 Review of Process Mechanisms. 2 Scheduling: Policy and Mechanism Scheduling policy answers the question: Which process/thread, among all those ready.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2006 Universität Dortmund Periodic scheduling For periodic scheduling, the best that we can do is to.
B. RAMAMURTHY 12/25/2015 Realtime System Fundamentals : Scheduling and Priority-based scheduling Pag e 1.
Rate Monotonic Analysis Rob Oshana Southern Methodist University.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Introduction to Embedded Systems Rabie A. Ramadan 5.
A presentation for Brian Evans’ Embedded Software Class By Nate Forman Liaison Technology Inc. 3/30/2000 For Real-Time Scheduling.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
Undergraduate course on Real-time Systems Linköping University TDDD07 Real-time Systems Lecture 2: Scheduling II Simin Nadjm-Tehrani Real-time Systems.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
© 2013 MontaVista Software, LLC. MontaVista Confidential and Proprietary. CGE7 DevRocket7 Feature Demo Divya Vyas.
Introduction to Real-Time Operating Systems
REAL-TIME OPERATING SYSTEMS
CSE 120 Principles of Operating
Scheduling and Resource Access Protocols: Basic Aspects
RT Operating Systems & Scheduling
Unit OS9: Real-Time and Embedded Systems
EEE 6494 Embedded Systems Design
Rate Monotonic Analysis For Real-Time Scheduling A presentation for
CSCI1600: Embedded and Real Time Software
Lecture 2 Part 2 Process Synchronization
Computer Science & Engineering Electrical Engineering
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
First slide Rest of project 2 due next Friday Today:
Reminders Homework 4 due next Friday Rest of project 2 due next Friday
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CHAPTER 8 Resources and Resource Access Control
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Failure in the PATHFINDER Mission
CSCI1600: Embedded and Real Time Software
Real-Time Process Scheduling Concepts, Design and Implementations
Real-Time Process Scheduling Concepts, Design and Implementations
CSE 542: Operating Systems
Presentation transcript:

RTOS Scheduling 2.0 Problems - Solutions Excellent Book – PDF avail. Georgio Butazzo, “Predictable scheduling Algorithms and applications”, 3rd edition 4/16

RTOS Scheduling Problems Cause:: Mutual exclusion & deadlocks Priority inversion 2 Solutions Priority inheritance Priority ceiling 4/16

Synchronization Eg:- review conveyer belt object recognition two cameras used Tasks Acquisition: image_1 and image_2 Edge detect:edge1 and edge2 Task object height, shape recognition image_1 image_2 edge_1 Edge_2 shape height Recog. 4/16

Mutual Exclusion – mutex ; review locks affect scheduling. shared data structure across threads data access is atomic. Enforced by mutexes. critical section ::code executed while holding lock 4/16

Missed Deadlines ? Preemption: waiting for higher priority tasks Execution: execution time Blocking (priority inversion) delayed by lower priority tasks [major source of missing deadlines] Task schedulable if preemption + execution + blocking < deadline. Note: For a task to meet its deadline, it must accommodate Preemption from higher-priority tasks, Its own execution time, and Delays caused by lower-priority tasks (known as priority inversion or blocking). Remember that higher-priority tasks, from a rate monotonic perspective, are those with higher rates (or shorter periods). These can occur more than once in a task’s period. A task’s execution occurs once during its period. And blocking can occur at most once during the task’s period; blocking comes from lower-priority tasks, those that have slower rates (or longer periods). Preemption and execution are unavoidable. If these exceed capacity, one is faced with a classical throughput problem and the only remedy is to reduce the workload (which means changing the system requirements) or to increase the capacity by using a faster computer. Experience has shown that priority inversion (blocking), delay from lower-priority tasks, is a major source of missed deadlines. So we focus on identifying sources of priority inversion and try to reduce their blocking effects to enhance schedulability. 4/16 9

Priority Inversion eg. With - lock priorities highest lowest Task 3 acquires lock,  critical section; preempted by task 1, tries to acquire lock  blocks, waiitng for lock. Task 2 preempts task 3 at time 4, higher priority task 1 blocked. effect, priorities of tasks 1 and 2 get inverted, task 2 keeps task 1 waiting. 4/16

Priority Inversion (same example) Task 1 tries to acquire lock from Task 3 Task 2 (unrelated) preempts Task 3 before it releases lock, keeping Task 1 waiting Task 1 blocked Task 1 (highest priority) blocked by task 3 Attempts to lock data resource (blocked) Task1 B Task 2 Task 3 Task3 acquires lock Task3 releases lock 4/16

Mars Rover Pathfinder Landed July 1997 airbags deployed at 350m worked till Sep 97 few days into mission Started missing deadlines The Mars Pathfinder mission was widely proclaimed as "flawless" in the early days after its July 4th, 1997 landing on the Martian surface. Successes included its unconventional "landing" -- bouncing onto the Martian surface surrounded by airbags, deploying the Sojourner rover, and gathering and transmitting voluminous data back to Earth, including the panoramic pictures that were such a hit on the Web. But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing total system resets, each resulting in losses of data. The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once". This week at the IEEE Real-Time Systems Symposium I heard a fascinating keynote address by David Wilner, Chief Technical Officer of Wind River Systems. Wind River makes VxWorks, the real-time embedded systems kernel that was used in the Mars Pathfinder mission. In his talk, he explained in detail the actual software problems that caused the total system resets of the Pathfinder spacecraft, how they were diagnosed, and how they were solved. I wanted to share his story with each of you. VxWorks provides preemptive priority scheduling of threads. Tasks on the Pathfinder spacecraft were executed as threads with priorities that were assigned in the usual manner reflecting the relative urgency of these tasks. Pathfinder contained an "information bus", which you can think of as a shared memory area used for passing information between different components of the spacecraft. A bus management task ran frequently with high priority to move certain kinds of data in and out of the information bus. Access to the bus was synchronized with mutual exclusion locks (mutexes). The meteorological data gathering task ran as an infrequent, low priority thread, and used the information bus to publish its data. When publishing its data, it would acquire a mutex, do writes to the bus, and release the mutex. If an interrupt caused the information bus thread to be scheduled while this mutex was held, and if the information bus thread then attempted to acquire this same mutex in order to retrieve published data, this would cause it to block on the mutex, waiting until the meteorological thread released the mutex before it could continue. The spacecraft also contained a communications task that ran with medium priority. Most of the time this combination worked fine. However, very infrequently it was possible for an interrupt to occur that caused the (medium priority) communications task to be scheduled during the short interval while the (high priority) information bus thread was blocked waiting for the (low priority) meteorological data thread. In this case, the long-running communications task, having higher priority than the meteorological task, would prevent it from running, consequently preventing the blocked information bus task from running. After some time had passed, a watchdog timer would go off, notice that the data bus task had not been executed for some time, conclude that something had gone drastically wrong, and initiate a total system reset. 4/16

Pathfinder: deadlines missed  priority inversion VxWorks RTOS, preemptive scheduling few days into mission, Pathfinder sporadically missing deadlines Watchdog timer generated system resets  loss of data. T1 information bus manager -shared T2 communication task T3 meteorological data collection Mutex controlled shared information bus PIP used but disabled for performance; more on PIP later Sol PIP re-enabled from ground Blocked The Mars Pathfinder mission was widely proclaimed as "flawless" in the early days after its July 4th, 1997 landing on the Martian surface. Successes included its unconventional "landing" -- bouncing onto the Martian surface surrounded by airbags, deploying the Sojourner rover, and gathering and transmitting voluminous data back to Earth, including the panoramic pictures that were such a hit on the Web. But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing total system resets, each resulting in losses of data. The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once". This week at the IEEE Real-Time Systems Symposium I heard a fascinating keynote address by David Wilner, Chief Technical Officer of Wind River Systems. Wind River makes VxWorks, the real-time embedded systems kernel that was used in the Mars Pathfinder mission. In his talk, he explained in detail the actual software problems that caused the total system resets of the Pathfinder spacecraft, how they were diagnosed, and how they were solved. I wanted to share his story with each of you. VxWorks provides preemptive priority scheduling of threads. Tasks on the Pathfinder spacecraft were executed as threads with priorities that were assigned in the usual manner reflecting the relative urgency of these tasks. Pathfinder contained an "information bus", which you can think of as a shared memory area used for passing information between different components of the spacecraft. A bus management task ran frequently with high priority to move certain kinds of data in and out of the information bus. Access to the bus was synchronized with mutual exclusion locks (mutexes). The meteorological data gathering task ran as an infrequent, low priority thread, and used the information bus to publish its data. When publishing its data, it would acquire a mutex, do writes to the bus, and release the mutex. If an interrupt caused the information bus thread to be scheduled while this mutex was held, and if the information bus thread then attempted to acquire this same mutex in order to retrieve published data, this would cause it to block on the mutex, waiting until the meteorological thread released the mutex before it could continue. The spacecraft also contained a communications task that ran with medium priority. Most of the time this combination worked fine. However, very infrequently it was possible for an interrupt to occur that caused the (medium priority) communications task to be scheduled during the short interval while the (high priority) information bus thread was blocked waiting for the (low priority) meteorological data thread. In this case, the long-running communications task, having higher priority than the meteorological task, would prevent it from running, consequently preventing the blocked information bus task from running. After some time had passed, a watchdog timer would go off, notice that the data bus task had not been executed for some time, conclude that something had gone drastically wrong, and initiate a total system reset. 4/16

Pathfinder problem bc_sched detects bc_dist incomplete, generates system reset http://research.microsoft.com/~mbj/Mars_Pathfinde/Authoritative_Account.html 4/16

Priority Inheritance Protocol (PIP) solution I to priority inversion Priority Task1 > Task2 > Task 3 Task 3 acquires lock , enters critical section. Task1 preempts task 3, tries to acquire the lock & blocks. Task 3 inherits task 1 priority, - but only after Task 1 blocks – prevents preemption by task 2. PIP: When a task blocks attempting to acquire a lock, task holding the lock inherits priority of blocked task ( Task1 priority in above example); hence task holding the lock cannot be preemted by a lower priority task (Task 2 above example) 4/16

Pathfinder after S/W update 4/16

Deadlocks happen with priority inheritance -2 locks #include <pthread.h> pthread_mutex_t lock_a, lock_b; void* thread_1_function(void* arg) { pthread_mutex_lock(&lock_b); ... pthread_mutex_lock(&lock_a); pthread_mutex_unlock(&lock_a); pthread_mutex_unlock(&lock_b); } void* thread_2_function(void* arg) { Deadlocks happen with priority inheritance -2 locks task2 acquires lock a task1 preempts task2, acquires lock b  blocks trying to acquire lock a. Task2 blocks trying to acquire lock b. Deadlock no further progress possible. 4/16

PIP Problems – high OH Managing priority inheritance mechanism  updating task priority Managing data structure for inherited task priorities ( task & mutex lists) Dynamic priority management implies dynamic reordering of task lists Does not prevent deadlocks a complex data structure (not simply a stack) for storing the set of priorities inherited by each task (one list for each task and one for each mutex) 4/16

Priority Ceiling Protocol (PCP) prevents deadlock Lock priority = highest-priority task that can lock it. priority ceiling task can acquire lock only if task’s priority > priority ceilings of all locks currently held by other tasks Locks not held by any task don’t affect the task Prevents deadlocks Protocol extensions can support dynamic priority scheduling In this protocol, every lock / semaphore is assigned a priority ceiling equal to the priority of the highest priority task that can lock it. A task T can acquire a lock “a” only if the task’s priority is strictly higher than the priority ceiling of all locks currently holding other tasks. Intuitively, if we prebent tast T from acquiring lock “a” , then we ensure that task T will not hold lock “a” while later trying to acquire other locks held by other tasks. This prevents deadlocks from occurring. 4/16

Priority Ceiling Protocol eg RTOS solution II to priority inversion: Example locks a and b have priority ceilings equal to task 1 priority task1 preempts task 2 task 1 attempts to lock b, but cannot. task 2 currently holds lock a, with priority ceiling = task 1 priority Task 1 priority must be higher to lock b In this protocol, every lock / semaphore is assigned a priority ceiling equal to the priority of the highest priority task that can lock it. A task T can acquire a lock “a” only if the task’s priority is strictly higher than the priority ceiling of all locks currently holding other tasks. Intuitively, if we prebent tast T from acquiring lock “a” , then we ensure that task T will not hold lock “a” while later trying to acquire other locks held by other tasks. This prevents deadlocks from occurring. 4/16