Presentation is loading. Please wait.

Presentation is loading. Please wait.

RTOS Scheduling 2.0 Problems - Solutions

Similar presentations


Presentation on theme: "RTOS Scheduling 2.0 Problems - Solutions"— Presentation transcript:

1 RTOS Scheduling 2.0 Problems - Solutions
Excellent Book – PDF avail. Georgio Butazzo, “Predictable scheduling Algorithms and applications”, 3rd edition 4/16

2 RTOS Scheduling Problems
Cause:: Mutual exclusion & deadlocks Priority inversion 2 Solutions Priority inheritance Priority ceiling 4/16

3 Synchronization Eg:- review
conveyer belt object recognition two cameras used Tasks Acquisition: image_1 and image_2 Edge detect:edge1 and edge2 Task object height, shape recognition image_1 image_2 edge_1 Edge_2 shape height Recog. 4/16

4 Mutual Exclusion – mutex ; review
locks affect scheduling. shared data structure across threads data access is atomic. Enforced by mutexes. critical section ::code executed while holding lock 4/16

5 Missed Deadlines ? Preemption: waiting for higher priority tasks
Execution: execution time Blocking (priority inversion) delayed by lower priority tasks [major source of missing deadlines] Task schedulable if preemption + execution + blocking < deadline. Note: For a task to meet its deadline, it must accommodate Preemption from higher-priority tasks, Its own execution time, and Delays caused by lower-priority tasks (known as priority inversion or blocking). Remember that higher-priority tasks, from a rate monotonic perspective, are those with higher rates (or shorter periods). These can occur more than once in a task’s period. A task’s execution occurs once during its period. And blocking can occur at most once during the task’s period; blocking comes from lower-priority tasks, those that have slower rates (or longer periods). Preemption and execution are unavoidable. If these exceed capacity, one is faced with a classical throughput problem and the only remedy is to reduce the workload (which means changing the system requirements) or to increase the capacity by using a faster computer. Experience has shown that priority inversion (blocking), delay from lower-priority tasks, is a major source of missed deadlines. So we focus on identifying sources of priority inversion and try to reduce their blocking effects to enhance schedulability. 4/16 9

6 Priority Inversion eg. With - lock
priorities highest lowest Task 3 acquires lock,  critical section; preempted by task 1, tries to acquire lock  blocks, waiitng for lock. Task 2 preempts task 3 at time 4, higher priority task 1 blocked. effect, priorities of tasks 1 and 2 get inverted, task 2 keeps task 1 waiting. 4/16

7 Priority Inversion (same example)
Task 1 tries to acquire lock from Task 3 Task 2 (unrelated) preempts Task 3 before it releases lock, keeping Task 1 waiting Task 1 blocked Task 1 (highest priority) blocked by task 3 Attempts to lock data resource (blocked) Task1 B Task 2 Task 3 Task3 acquires lock Task3 releases lock 4/16

8 Mars Rover Pathfinder Landed July airbags deployed at 350m worked till Sep 97 few days into mission Started missing deadlines The Mars Pathfinder mission was widely proclaimed as "flawless" in the early days after its July 4th, 1997 landing on the Martian surface. Successes included its unconventional "landing" -- bouncing onto the Martian surface surrounded by airbags, deploying the Sojourner rover, and gathering and transmitting voluminous data back to Earth, including the panoramic pictures that were such a hit on the Web. But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing total system resets, each resulting in losses of data. The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once". This week at the IEEE Real-Time Systems Symposium I heard a fascinating keynote address by David Wilner, Chief Technical Officer of Wind River Systems. Wind River makes VxWorks, the real-time embedded systems kernel that was used in the Mars Pathfinder mission. In his talk, he explained in detail the actual software problems that caused the total system resets of the Pathfinder spacecraft, how they were diagnosed, and how they were solved. I wanted to share his story with each of you. VxWorks provides preemptive priority scheduling of threads. Tasks on the Pathfinder spacecraft were executed as threads with priorities that were assigned in the usual manner reflecting the relative urgency of these tasks. Pathfinder contained an "information bus", which you can think of as a shared memory area used for passing information between different components of the spacecraft. A bus management task ran frequently with high priority to move certain kinds of data in and out of the information bus. Access to the bus was synchronized with mutual exclusion locks (mutexes). The meteorological data gathering task ran as an infrequent, low priority thread, and used the information bus to publish its data. When publishing its data, it would acquire a mutex, do writes to the bus, and release the mutex. If an interrupt caused the information bus thread to be scheduled while this mutex was held, and if the information bus thread then attempted to acquire this same mutex in order to retrieve published data, this would cause it to block on the mutex, waiting until the meteorological thread released the mutex before it could continue. The spacecraft also contained a communications task that ran with medium priority. Most of the time this combination worked fine. However, very infrequently it was possible for an interrupt to occur that caused the (medium priority) communications task to be scheduled during the short interval while the (high priority) information bus thread was blocked waiting for the (low priority) meteorological data thread. In this case, the long-running communications task, having higher priority than the meteorological task, would prevent it from running, consequently preventing the blocked information bus task from running. After some time had passed, a watchdog timer would go off, notice that the data bus task had not been executed for some time, conclude that something had gone drastically wrong, and initiate a total system reset. 4/16

9 Pathfinder: deadlines missed  priority inversion
VxWorks RTOS, preemptive scheduling few days into mission, Pathfinder sporadically missing deadlines Watchdog timer generated system resets  loss of data. T1 information bus manager -shared T2 communication task T3 meteorological data collection Mutex controlled shared information bus PIP used but disabled for performance; more on PIP later Sol PIP re-enabled from ground Blocked The Mars Pathfinder mission was widely proclaimed as "flawless" in the early days after its July 4th, 1997 landing on the Martian surface. Successes included its unconventional "landing" -- bouncing onto the Martian surface surrounded by airbags, deploying the Sojourner rover, and gathering and transmitting voluminous data back to Earth, including the panoramic pictures that were such a hit on the Web. But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing total system resets, each resulting in losses of data. The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once". This week at the IEEE Real-Time Systems Symposium I heard a fascinating keynote address by David Wilner, Chief Technical Officer of Wind River Systems. Wind River makes VxWorks, the real-time embedded systems kernel that was used in the Mars Pathfinder mission. In his talk, he explained in detail the actual software problems that caused the total system resets of the Pathfinder spacecraft, how they were diagnosed, and how they were solved. I wanted to share his story with each of you. VxWorks provides preemptive priority scheduling of threads. Tasks on the Pathfinder spacecraft were executed as threads with priorities that were assigned in the usual manner reflecting the relative urgency of these tasks. Pathfinder contained an "information bus", which you can think of as a shared memory area used for passing information between different components of the spacecraft. A bus management task ran frequently with high priority to move certain kinds of data in and out of the information bus. Access to the bus was synchronized with mutual exclusion locks (mutexes). The meteorological data gathering task ran as an infrequent, low priority thread, and used the information bus to publish its data. When publishing its data, it would acquire a mutex, do writes to the bus, and release the mutex. If an interrupt caused the information bus thread to be scheduled while this mutex was held, and if the information bus thread then attempted to acquire this same mutex in order to retrieve published data, this would cause it to block on the mutex, waiting until the meteorological thread released the mutex before it could continue. The spacecraft also contained a communications task that ran with medium priority. Most of the time this combination worked fine. However, very infrequently it was possible for an interrupt to occur that caused the (medium priority) communications task to be scheduled during the short interval while the (high priority) information bus thread was blocked waiting for the (low priority) meteorological data thread. In this case, the long-running communications task, having higher priority than the meteorological task, would prevent it from running, consequently preventing the blocked information bus task from running. After some time had passed, a watchdog timer would go off, notice that the data bus task had not been executed for some time, conclude that something had gone drastically wrong, and initiate a total system reset. 4/16

10 Pathfinder problem bc_sched detects bc_dist incomplete, generates system reset 4/16

11 Priority Inheritance Protocol (PIP) solution I to priority inversion
Priority Task1 > Task2 > Task 3 Task 3 acquires lock , enters critical section. Task1 preempts task 3, tries to acquire the lock & blocks. Task 3 inherits task 1 priority, - but only after Task 1 blocks – prevents preemption by task 2. PIP: When a task blocks attempting to acquire a lock, task holding the lock inherits priority of blocked task ( Task1 priority in above example); hence task holding the lock cannot be preemted by a lower priority task (Task 2 above example) 4/16

12 Pathfinder after S/W update
4/16

13 Deadlocks happen with priority inheritance -2 locks
#include <pthread.h> pthread_mutex_t lock_a, lock_b; void* thread_1_function(void* arg) { pthread_mutex_lock(&lock_b); ... pthread_mutex_lock(&lock_a); pthread_mutex_unlock(&lock_a); pthread_mutex_unlock(&lock_b); } void* thread_2_function(void* arg) { Deadlocks happen with priority inheritance -2 locks task2 acquires lock a task1 preempts task2, acquires lock b  blocks trying to acquire lock a. Task2 blocks trying to acquire lock b. Deadlock no further progress possible. 4/16

14 PIP Problems – high OH Managing priority inheritance mechanism
 updating task priority Managing data structure for inherited task priorities ( task & mutex lists) Dynamic priority management implies dynamic reordering of task lists Does not prevent deadlocks a complex data structure (not simply a stack) for storing the set of priorities inherited by each task (one list for each task and one for each mutex) 4/16

15 Priority Ceiling Protocol (PCP) prevents deadlock
Lock priority = highest-priority task that can lock it. priority ceiling task can acquire lock only if task’s priority > priority ceilings of all locks currently held by other tasks Locks not held by any task don’t affect the task Prevents deadlocks Protocol extensions can support dynamic priority scheduling In this protocol, every lock / semaphore is assigned a priority ceiling equal to the priority of the highest priority task that can lock it. A task T can acquire a lock “a” only if the task’s priority is strictly higher than the priority ceiling of all locks currently holding other tasks. Intuitively, if we prebent tast T from acquiring lock “a” , then we ensure that task T will not hold lock “a” while later trying to acquire other locks held by other tasks. This prevents deadlocks from occurring. 4/16

16 Priority Ceiling Protocol eg RTOS solution II to priority inversion: Example
locks a and b have priority ceilings equal to task 1 priority task1 preempts task 2 task 1 attempts to lock b, but cannot. task 2 currently holds lock a, with priority ceiling = task 1 priority Task 1 priority must be higher to lock b In this protocol, every lock / semaphore is assigned a priority ceiling equal to the priority of the highest priority task that can lock it. A task T can acquire a lock “a” only if the task’s priority is strictly higher than the priority ceiling of all locks currently holding other tasks. Intuitively, if we prebent tast T from acquiring lock “a” , then we ensure that task T will not hold lock “a” while later trying to acquire other locks held by other tasks. This prevents deadlocks from occurring. 4/16


Download ppt "RTOS Scheduling 2.0 Problems - Solutions"

Similar presentations


Ads by Google