Real-Time Mutli-core Scheduling Moris Behnam. Introduction Single processor scheduling – E.g., t 1 (P=10,C=5), t 2 (10, 6) – U=0.5+0.6>1 – Use a faster.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund,
Advertisements

Washington WASHINGTON UNIVERSITY IN ST LOUIS Real-Time: Periodic Tasks Fred Kuhns Applied Research Laboratory Computer Science Washington University.
Real Time Scheduling.
Priority Inheritance and Priority Ceiling Protocols
Washington WASHINGTON UNIVERSITY IN ST LOUIS Resource and Resource Access Control Fred Kuhns Applied Research Laboratory Computer Science and Engineering.
Priority INHERITANCE PROTOCOLS
1 EE5900 Advanced Embedded System For Smart Infrastructure RMS and EDF Scheduling.
Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.
CS5270 Lecture 31 Uppaal, and Scheduling, and Resource Access Protocols CS 5270 Lecture 3.
CSE 522 Real-Time Scheduling (3)
CSE 522 Real-Time Scheduling (4)
RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡
Task Allocation and Scheduling n Problem: How to assign tasks to processors and to schedule them in such a way that deadlines are met n Our initial focus:
Tasks Periodic The period is the amount of time between each iteration of a regularly repeated task Time driven The task is automatically activated by.
Module 2 Priority Driven Scheduling of Periodic Task
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
Technische Universität Dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2007/12/14.
New Schedulability Tests for Real- Time task sets scheduled by Deadline Monotonic on Multiprocessors Marko Bertogna, Michele Cirinei, Giuseppe Lipari Scuola.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
1 Reducing Queue Lock Pessimism in Multiprocessor Schedulability Analysis Yang Chang, Robert Davis and Andy Wellings Real-time Systems Research Group University.
Multiprocessor Real-time Scheduling Jing Ma 马靖. Classification Partitioned Scheduling In the partitioned approach, the tasks are statically partitioned.
Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.
Real Time Scheduling Telvis Calhoun CSc Outline Introduction Real-Time Scheduling Overview Tasks, Jobs and Schedules Rate/Deadline Monotonic Deferrable.
6. Application mapping 6.1 Problem definition
RTOS task scheduling models
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2006 Universität Dortmund Periodic scheduling For periodic scheduling, the best that we can do is to.
Special Class on Real-Time Systems
CSE 522 Real-Time Scheduling (2)
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Introduction to Embedded Systems Rabie A. Ramadan 5.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Lecture 2, CS52701 The Real Time Computing Environment I CS 5270 Lecture 2.
Introduction to Real-Time Systems
Sandtids systemer 2.modul el. Henriks 1. forsøg m. Power Point.
Undergraduate course on Real-time Systems Linköping University TDDD07 Real-time Systems Lecture 2: Scheduling II Simin Nadjm-Tehrani Real-time Systems.
1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
1 Uniprocessor Scheduling Chapter 3. 2 Alternating Sequence of CPU And I/O Bursts.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Lecture 6: Real-Time Scheduling
Distributed Process Scheduling- Real Time Scheduling Csc8320(Fall 2013)
Embedded System Scheduling
Multiprocessor Real-Time Scheduling
EMERALDS Landon Cox March 22, 2017.
EEE Embedded Systems Design Process in Operating Systems 서강대학교 전자공학과
Scheduling and Resource Access Protocols: Basic Aspects
Wayne Wolf Dept. of EE Princeton University
Unit OS9: Real-Time and Embedded Systems
EEE 6494 Embedded Systems Design
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Real Time Scheduling Mrs. K.M. Sanghavi.
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
CSCI1600: Embedded and Real Time Software
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
Limited-Preemption Scheduling of Sporadic Tasks Systems
Chapter 6: CPU Scheduling
CSCI1600: Embedded and Real Time Software
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Shortest-Job-First (SJR) Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
The End Of The Line For Static Cyclic Scheduling?
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

Real-Time Mutli-core Scheduling Moris Behnam

Introduction Single processor scheduling – E.g., t 1 (P=10,C=5), t 2 (10, 6) – U= >1 – Use a faster processor Thermal and power problems impose limits on the performance of single- core M ultiple processor (multicore) Problem formulation – Given a set of real time task running on Multicore architecture, find a scheduling algorithm that guarantee the scheduability of the task set.

Task model Periodic task model t i (T,C,D) – Releases infinite jobs every P period – For all t i if P=D, Implicit‐deadlines – If P>D, Constrained deadline – Otherwise, Arbitrary deadlines Sporadic task model t i (P,C,D) – P is the minimum inter arrival time between two consecutive jobs A task is not allowed to be executed on more than one processor/core at the same time. titi j i1 j i2 j i3 TiTi TiTi TiTi //Control task Tc i t = CurrentTime; LOOP S=read_sensor(); Statement1;. Statement2; Actuate; t = t + Tc i ; WaitUntil(t); END titi j i1 j i2 TiTi TiTi //Monitor task Mc i t = CurrentTime; LOOP S=read_sensor(); Statement1;. Statement2; Actuate; WaitUntil(sensor_signal); END

Task model Task utilization t i, U i =C i /T i Task density δ i = C i /min(T i, D i ) The processor demand bound function h(t) corresponds to the maximum amount of task execution that can be released in a time interval [0, t) The processor load is the maximum value of the processor demand bound divided by the length of the time interval A simple necessary condition for task set feasibility

Multicore platefrom Include several processors on a single chip Different cores share either on- or off-chip caches Cores are identical (homogenous) L1 Cache L2 Cache Processor Core 1 Processor Core 2 Processor Core 3 Processor Core 4

Design space Tasks allocation – no migration – task migration – job migration Priority – fixed task priority – fixed job priority – dynamic priority Scheduling constraints – non-preemption – fully preemption – limited preemption time P1P1 P2P2 t1t1 t2t2 t3t3 t3t3 t1t1 t2t2 P1P1 P2P2 t1t1 t2t2 t3t3 t3t3 t1t1 t2t2 P1P1 P2P2 t1t1 t2t2 t3t3 t3t3 t1t1 t2t2 t2t2 t1t1 t2t2 t1t1 t2t2 t2t2 t1t1 t2t2 t1t1 t2t2

Mutliprocessor scheduling Partitioned scheduling Global scheduling P1P1 P2P2 P3P3 … Tasks Processors P1P1 P2P2 P3P3 … Tasks Processors

Partitioned scheduling Advantages – Isolation between cores – No migration overhead – Simple queus managements – Uniprocessor scheduling and analysis Disadvantage – Task set allocation (NP hard problem) Bin packing heuristics First-Fit (FF) Next-Fit (NF) Best-Fit (BF) Worst-Fit (WF) Task orderings in Decreasing Utilisation (DU) combined with above U=1 Ci/Ti

Partitioned scheduling The largest worst-case utilization bound for any partitioning algorithm is U=(m+1)/2 Implicit deadline task set Utilization bounds for the RMST (“Small Tasks”) RM-FFDU has a utilization bound Utilization bound for any fixed task priority For EDF-BF and EDF-FF with DU m +1 tasks with execution time 1+ε and a period of 2, Ui>0.5, cannot be scheduled on m processors independent on the scheduling and allocation algorithms.

Partitioned scheduling Constrained and arbitrary dealine FFB-FFD algorithm (deadline monotonic with decreasing density) and assuming – constrained-deadlines – arbitrary deadlines EDF-FFD (decreasing density) – constrained-deadlines – arbitrary deadlines

Global scheduling Advantages – Fewer context switches / pre-emptions – Unused capacity can be used by all other tasks – More appropriate for open systems Disadvantages – Job migration overhead

Global scheduling Implicit deadline and periodic tasks Global RM, fully preemptive and migration example: n=m+1, t 1,..,t n-1 (C=2ε, T=1), t n (C=1, T=1 + ε) Increase the priority of t n time P1P1 PmPm t1t1 tntn tmtm t1t1 tmtm Miss deadline Utilization bound ≈0 time P1P1 PmPm t1t1 tntn tmtm t1t1 tmtm

Global scheduling RM‐US(m/(3m‐2) algorithm Tasks are categorized based on their utilization A task t i is considered heavy if C i /T i > m/(3m‐2) Otherwise it is considered as light Heavy tasks assigned higher priority than lighter RM is applied on the light tasks to assign priority Utilization bound is U RM‐US(m/(3m‐2) =m*m/(3m‐2) Example: suppose a systems has n=4, m=3 with the following task parameters t 1 (0.4,4), t 2 (0.6,6), t 3 (0.45,9), t 4 (8,10), then the priority assignment according to the algorithm will be, the highest for t 4 as it is a heavy task and then t 1, t 2, t 3 (lowest), based on RM.

Global scheduling Global EDF, fully preemptive and migration (fixed job priority, dynamic task priority) Utilization based, U EDF = m − (m −1)u max Same problem as in global RM time P1P1 PmPm t1t1 tntn tmtm t1t1 tmtm Miss deadline time P1P1 PmPm t1t1 tntn tmtm t1t1 tmtm

Global scheduling EDF‐US(m/(2m‐1) algorithm Tasks are categorized based on their utilization A task t i is considered heavy if C i /T i > m/(2m‐1) Otherwise it is considered as light Heavy tasks assigned higher priority than lighter Relative priority order based on EDF is applied on the light tasks Utilization bound is U EDF‐US(m/(2m‐1) =m*m/(2m‐1)

Global scheduling Constrained and arbitrary deadline Critical instant – In uniprocessor, when all tasks are released simultaneously – In multiprocessor it is not the case as shown in the following example Example: suppose a system with n=4, m=2, t 1 (C=2,D=2,T=8), t 2 (2,2,10), t 3 (4,6,8), t 4 (4,7,8) Deadline miss

Global scheduling Determining the schedulability of sporadic task sets – Consider an interval from the release to the deadline of some job of task t k – Establish a condition necessary for the job to miss its deadline, for example each processor executes other tasks more than D k −C k – Derive an upper bound I UB on the maximum interference in the interval from jobs released in the interval and also from jobs that are released before the interval and have remaining execution (carry-in jobs) – Form a necessary un-schedulability test from I UB and necessary condition for deadline miss

Global scheduling Based on the previous test and assuming global EDF algorithm, the job of τ k misses its deadline if the load in the interval is at least m(1−δ k ) +δ k A constrained-deadline task set is schedulable under pre-emptive global EDF scheduling if for every task For fixed task priority, this response time upper bound is

Global scheduling Pfair algorithm (Proportionate fairness algorithms) Motivations – All mentioned mutliprocessor scheduling have maximum utilization bound 50% – Ideally, a utilisation bound of 100% is more interesting. The algorithm is the only known optimal scheduling for periodic implicit deadline task It is based on dynamic job priority Timeline is divided into equal length slots Tasks period and execution time is a multiple of the slot size Each task receives amount of slots proportional to the task utilization Disadvantages of Pfair – Computational overheads are relatively high – Too many preemptions (up to 1 per quantum per processor)

Hybrid/semi-partitioned What if some tasks are allocated to specific processor and other are scheduled globally? Example: – t 1, t 3 and t 5 are assigned to P 1 – t 2 and t 7 are assigned to P 2 – t 4 and t 8 can be executed in P 1 and P 2 This kind of scheduling is called hybrid or semi-partitioned multiprocessor scheduling

Hybrid/semi-partitioned EKG approach Assuming periodic task model and implicit deadline Use bin packing algorithm to allocate tasks to processors Tasks that can not fit into processors are splitted into up to k parts Split tasks can be executed in up to k processors out of m

Hybrid/semi-partitioned If k=m – Tasks are assigned using “next-fit” bin-packing – Processors are filled up to 100% – Example

Hybrid/semi-partitioned If m < k – Tasks are categorized as heavy or light – Heavy task has U i > SEP=k(k+1), otherwise tasks are considered as light – First, all heavy tasks are assigned to processors, one in each processor – Light tasks are assigned to the processors using the remaining utilization – The utilization bound is equal to m * SEP Dispatching – Partitioned tasks are scheduled using EDF – Reservations are used in each processor to execute the split tasks and the priority of the reservation is always greater than the other tasks – The reserves of τ i on P p and P p+1 can never overlap. Overhead – For split tasks, each may cause up to k-migration every task period

Cluster scheduling Combining partition and global scheduling Tasks are grouped into a set of clusters Each cluster is allocated to a number of cores m less than or equal to the total number of cores n i.e., m ≤ n. Tasks within a cluster can be migrated between only the processors that are allocated for that cluster P1P1 P2P2 P3P3 … Tasks Processors P4P4

Cluster scheduling Physical clusters, allocated to m certain cores Virtual clusters can be allocated to any m available cores (hierarchical scheduling, a scheduler to select clusters and inside each cluster there is a scheduler that selects the tasks to execute )

Mutliprocessor synchronization All presented algorithms do not support resource sharing In multiprocessor, there are three general approaches – Lock based – Lock free – Wait free Lock based: each task locks a mutes before accessing a shared resource, and releases it when it finishes. Resources can be classified as local resource and global resource When a task is blocked trying to access a shared resource: – It is suspended until the resource become available – Continue executing in a busy wait

Mutliprocessor synchronization Partitioned scheduling, suspension Problems: – Remote blocking: tasks may be blocked by other tasks located in other processors (no direct relation between tasks) – Multiple priority inversions due to suspensions (low priority tasks may execute while the higher priority tasks are suspended and accessing global resources) Hp task Lp task Remote blocking P 1 P 2 Critical section

Mutliprocessor synchronization MPCP (multiprocessor priority ceiling protocol) – Reduces and bounds the remote blocking – applicable to partitioned systems using fixed priorities – Global mutex Is used to protect global resources – Priority ceiling=Max (All executing task priorities) + Max (priorities of tasks accessing the shared resource) – A task accessing a global shared resource can be preempted by a awakened waiting task on a higher priority ceiling – Each global resource has a priority queue – No nested access to shared resources is allowed – The blocking factor is made up of five different components

Mutliprocessor synchronization MPCP Priority queue Priority Queue Shared Resource Shared Resource PiPi PjPj

Mutliprocessor synchronization MSRP for partitioned scheduling – Based on SRP protocol for single processor – Can be used with FPS and EDF – when a task is blocked on a global resource under MSRP, it busy waits and is not preemptable – A FIFO queue is used to grant access to tasks waiting on a global resource when it is unlocked Comparing MPCP and SRP – MSRP removes two of the five contributions to the blocking factor – MSRP Consumes processor time that could be used by other tasks – MSRP is simpler to be implemented

Mutliprocessor synchronization Lock free approach Tasks access resources concurrently A task repeats the access to a shared resource whenever the input data is changes due to a concurrent access by another task Lock-free approach increases the execution times of tasks Typically, requires hardware support

Mutliprocessor synchronization Wait free Multiple buffers are used Does not impose blocking on the tasks accessing shared resources nor increasing the execution times of tasks Requires more memory allocation (buffers)

Other related issues Parallel task model Worst-case Execution Time (WCET) analysis Network / bus scheduling Memory architectures Scheduling of uniform and heterogeneous processors Operating Systems Power consumption and dissipation Scheduling tasks with soft real-time constraints Many cores architecture Virtualization