John Kubiatowicz and Anthony D. Joseph

Slides:



Advertisements
Similar presentations
1 EE5900 Advanced Embedded System For Smart Infrastructure RMS and EDF Scheduling.
Advertisements

CPE555A: Real-Time Embedded Systems
Beyond Dominant Resource Fairness David Parkes (Harvard) Ariel Procaccia (CMU) Nisarg Shah (CMU)
Mehdi Kargahi School of ECE University of Tehran
No Agent Left Behind: Dynamic Fair Division of Multiple Resources Ian Kash 1 Ariel Procaccia 2 Nisarg Shah 2 (Speaker) 1 MSR Cambridge 2 Carnegie Mellon.
Real-Time Scheduling CIS700 Insup Lee October 3, 2005 CIS 700.
Task Allocation and Scheduling n Problem: How to assign tasks to processors and to schedule them in such a way that deadlines are met n Our initial focus:
Module 2 Priority Driven Scheduling of Periodic Task
CS Advanced Operating Systems Structures and Implementation Lecture 11 Scheduling (Con’t) Real-Time Scheduling March 6 th, 2013 Prof. John Kubiatowicz.
Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.
Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
Embedded Systems Exercise 3: Scheduling Real-Time Periodic and Mixed Task Sets 18. May 2005 Alexander Maxiaguine.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment Presented by Pete Perlegos C.L. Liu and James W. Layland.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
Spring 2002Real-Time Systems (Shin) Rate Monotonic Analysis Assumptions – A1. No nonpreemptible parts in a task, and negligible preemption cost –
Real-Time Operating System Chapter – 8 Embedded System: An integrated approach.
EECS 262a Advanced Topics in Computer Systems Lecture 12 Realtime Scheduling CBS and M-CBS Resource-Centric Computing October 13th, 2014 John Kubiatowicz.
DRFQ: Multi-Resource Fair Queueing for Packet Processing Ali Ghodsi 1,3, Vyas Sekar 2, Matei Zaharia 1, Ion Stoica 1 1 UC Berkeley, 2 Intel ISTC/Stony.
MM Process Management Karrie Karahalios Spring 2007 (based off slides created by Brian Bailey)
L14. Fair networks and topology design D. Moltchanov, TUT, Spring 2008 D. Moltchanov, TUT, Spring 2015.
DRFQ: Multi-Resource Fair Queueing for Packet Processing Ali Ghodsi 1,3, Vyas Sekar 2, Matei Zaharia 1, Ion Stoica 1 1 UC Berkeley, 2 Intel ISTC/Stony.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
E VALUATION OF F AIRNESS IN ICN X. De Foy, JC. Zuniga, B. Balazinski InterDigital
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Scheduling policies for real- time embedded systems.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.
CS Advanced Operating Systems Structures and Implementation Lecture 12 Dominant Resource Fairness Two-Level Scheduling March 11 th, 2013 Prof. John.
Real-Time Scheduling CS4730 Fall 2010 Dr. José M. Garrido Department of Computer Science and Information Systems Kennesaw State University.
Real-Time Scheduling CS 3204 – Operating Systems Lecture 20 3/3/2006 Shahrooz Feizabadi.
EECS 262a Advanced Topics in Computer Systems Lecture 13 Resource allocation: Lithe/DRF October 16 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical.
6. Application mapping 6.1 Problem definition
Undergraduate course on Real-time Systems Linköping 1 of 45 Autumn 2009 TDDC47: Real-time and Concurrent Programming Lecture 5: Real-time Scheduling (I)
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems RMS and EDF Schedulers.
Shanjiang Tang, Bu-Sung Lee, Bingsheng He, Haikun Liu School of Computer Engineering Nanyang Technological University Long-Term Resource Fairness Towards.
Special Class on Real-Time Systems
CSE 522 Real-Time Scheduling (2)
Real Time Operating Systems Schedulability - Part 2 Course originally developed by Maj Ron Smith 12/20/2015Dr Alain Beaulieu1.
Real-Time Scheduling CS 3204 – Operating Systems Lecture 13 10/3/2006 Shahrooz Feizabadi.
1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Dynamic Priority Driven Scheduling of Periodic Task
Introduction to Real-Time Systems
A Platform for Fine-Grained Resource Sharing in the Data Center
CS333 Intro to Operating Systems Jonathan Walpole.
Undergraduate course on Real-time Systems Linköping University TDDD07 Real-time Systems Lecture 2: Scheduling II Simin Nadjm-Tehrani Real-time Systems.
Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Lecture 6: Real-Time Scheduling
Improved Conditions for Bounded Tardiness under EPDF Fair Multiprocessor Scheduling UmaMaheswari Devi and Jim Anderson University of North Carolina at.
Embedded System Scheduling
COS 518: Advanced Computer Systems Lecture 13 Michael Freedman
Lottery Scheduling and Dominant Resource Fairness (Lecture 24, cs262a)
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
EECS 262a Advanced Topics in Computer Systems Lecture 13 Resource allocation: Lithe/DRF March 7th, 2016 John Kubiatowicz Electrical Engineering and Computer.
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
PA an Coordinated Memory Caching for Parallel Jobs
Real Time Scheduling Mrs. K.M. Sanghavi.
Sanjoy Baruah The University of North Carolina at Chapel Hill
COS 518: Advanced Computer Systems Lecture 14 Michael Freedman
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
Processes and operating systems
Ch.7 Scheduling Aperiodic and Sporadic Jobs in Priority-Driven Systems
Ch 4. Periodic Task Scheduling
Presentation transcript:

EECS 262a Advanced Topics in Computer Systems Lecture 13 M-CBS(Con’t) and DRF October 10th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262 CS252 S05

Online Scheduling for Realtime

Schedulability Test Test to determine whether a feasible schedule exists Sufficient Test If test is passed, then tasks are definitely schedulable If test is not passed, tasks may be schedulable, but not necessarily Necessary Test If test is passed, tasks may be schedulable, but not necessarily If test is not passed, tasks are definitely not schedulable Exact Test (= Necessary + Sufficient) The task set is schedulable if and only if it passes the test.

Rate Monotonic Analysis: Assumptions A1: Tasks are periodic (activated at a constant rate). Period = Intervall between two consequtive activations of task A2: All instances of a periodic task have the same computation time A3: All instances of a periodic task have the same relative deadline, which is equal to the period A4: All tasks are independent (i.e., no precedence constraints and no resource constraints) Implicit assumptions: A5: Tasks are preemptable A6: No task can suspend itself A7: All tasks are released as soon as they arrive A8: All overhead in the kernel is assumed to be zero (or part of )

Rate Monotonic Scheduling: Principle Principle: Each process is assigned a (unique) priority based on its period (rate); always execute active job with highest priority The shorter the period the higher the priority ( 1 = low priority) W.l.o.g. number the tasks in reverse order of priority Process Period Priority Name A 25 5 T1 B 60 3 T3 C 42 4 T2 D 105 1 T5 E 75 2 T4

Example: Rate Monotonic Scheduling Example instance RMA - Gant chart

Example: Rate Monotonic Scheduling Deadline Miss 5 10 15 response time of job

Utilization 5 10 15

RMS: Schedulability Test Theorem (Utilization-based Schedulability Test): A periodic task set with is schedulable by the rate monotonic scheduling algorithm if This schedulability test is “sufficient”! For harmonic periods ( evenly divides ), the utilization bound is 100%

RMS Example The schedulability test requires Hence, we get does not satisfy schedulability condition

EDF: Assumptions A1: Tasks are periodic or aperiodic. Period = Interval between two consequtive activations of task A2: All instances of a periodic task have the same computation time A3: All instances of a periodic task have the same relative deadline, which is equal to the period A4: All tasks are independent (i.e., no precedence constraints and no resource constraints) Implicit assumptions: A5: Tasks are preemptable A6: No task can suspend itself A7: All tasks are released as soon as they arrive A8: All overhead in the kernel is assumed to be zero (or part of )

EDF Scheduling: Principle Preemptive priority-based dynamic scheduling Each task is assigned a (current) priority based on how close the absolute deadline is. The scheduler always schedules the active task with the closest absolute deadline. 5 10 15

EDF: Schedulability Test Theorem (Utilization-based Schedulability Test): A task set with is schedulable by the earliest deadline first (EDF) scheduling algorithm if Exact schedulability test (necessary + sufficient) Proof: [Liu and Layland, 1973]

EDF Optimality EDF Properties EDF is optimal with respect to feasibility (i.e., schedulability) EDF is optimal with respect to minimizing the maximum lateness

EDF Example: Domino Effect EDF minimizes lateness of the “most tardy task” [Dertouzos, 1974] Frank Drews Real-Time Systems

Constant Bandwidth Server Intuition: give fixed share of CPU to certain of jobs Good for tasks with probabilistic resource requirements Basic approach: Slots (called “servers”) scheduled with EDF, rather than jobs CBS Server defined by two parameters: Qs and Ts Mechanism for tracking processor usage so that no more than Qs CPU seconds used every Ts seconds (or whatever measurement you like) when there is demand. Otherwise get to use processor as you like Since using EDF, can mix hard-realtime and soft realtime:

Today’s Papers Thoughts? Implementing Constant-Bandwidth Servers upon Multiprocessor Platforms Sanjoy Baruah, Joel Goossens, and Giuseppe Lipari . Appears in Proceedings of Real-Time and Embedded Technology and Applications Symposium, (RTAS), 2002. (From Last Time!) Dominant Resource Fairness: Fair Allocation of Multiple Resources Types, A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, Usenix NSDI 2011, Boston, MA, March 2011 Thoughts?

CBS on multiprocessors Basic problem: EDF not all that efficient on multiprocessors. Schedulability constraint considerably less good than for uniprocessors. Need: Key idea of paper: send highest-utilization jobs to specific processors, use EDF for rest Minimizes number of processors required New acceptance test:

Is this a good paper? What were the authors’ goals? What about the evaluation/metrics? Did they convince you that this was a good system/approach? Were there any red-flags? What mistakes did they make? Does the system/approach meet the “Test of Time” challenge? How would you review this paper today?

What is Fair Sharing? n users want to share a resource (e.g., CPU) 100% 50% 0% 33% What is Fair Sharing? n users want to share a resource (e.g., CPU) Solution: Allocate each 1/n of the shared resource Generalized by max-min fairness Handles if a user wants less than its fair share E.g. user 1 wants no more than 20% Generalized by weighted max-min fairness Give weights to users according to importance User 1 gets weight 1, user 2 weight 2 100% 50% 0% 20% 40% 100% 50% 0% 33% 66% Less than its fair share

Why is Fair Sharing Useful? Weighted Fair Sharing / Proportional Shares User 1 gets weight 2, user 2 weight 1 Priorities Give user 1 weight 1000, user 2 weight 1 Revervations Ensure user 1 gets 10% of a resource Give user 1 weight 10, sum weights ≤ 100 Isolation Policy Users cannot affect others beyond their fair share CPU 100% 50% 0% 66% 33% CPU 100% 50% 0% 10% 40%

Properties of Max-Min Fairness Share guarantee Each user can get at least 1/n of the resource But will get less if her demand is less Strategy-proof Users are not better off by asking for more than they need Users have no reason to lie Max-min fairness is the only “reasonable” mechanism with these two properties

Why Care about Fairness? Desirable properties of max-min fairness Isolation policy: A user gets her fair share irrespective of the demands of other users Flexibility separates mechanism from policy: Proportional sharing, priority, reservation,... Many schedulers use max-min fairness Datacenters: Hadoop’s fair sched, capacity, Quincy OS: rr, prop sharing, lottery, linux cfs, ... Networking: wfq, wf2q, sfq, drr, csfq, ...

When is Max-Min Fairness not Enough? Need to schedule multiple, heterogeneous resources Example: Task scheduling in datacenters Tasks consume more than just CPU – CPU, memory, disk, and I/O What are today’s datacenter task demands?

Heterogeneous Resource Demands Some tasks are CPU-intensive Most task need ~ <2 CPU, 2 GB RAM> Some tasks are memory-intensive Put fb in title 2000-node Hadoop Cluster at Facebook (Oct 2010)

? ? Problem Single resource example Multi-resource example CPU 100% 50% 0% Single resource example 1 resource: CPU User 1 wants <1 CPU> per task User 2 wants <3 CPU> per task Multi-resource example 2 resources: CPUs & memory User 1 wants <1 CPU, 4 GB> per task User 2 wants <3 CPU, 1 GB> per task What is a fair allocation? 50% 50% CPU 100% 50% 0% mem ? ?

How to fairly share multiple resources when users have heterogeneous demands on them? Problem definition

Demands at Facebook

Model Users have tasks according to a demand vector e.g. <2, 3, 1> user’s tasks need 2 R1, 3 R2, 1 R3 Not needed in practice, can simply measure actual consumption Resources given in multiples of demand vectors Assume divisible resources

A Natural Policy: Asset Fairness Equalize each user’s sum of resource shares Cluster with 70 CPUs, 70 GB RAM U1 needs <2 CPU, 2 GB RAM> per task U2 needs <1 CPU, 2 GB RAM> per task Asset fairness yields U1: 15 tasks: 30 CPUs, 30 GB (∑=60) U2: 20 tasks: 20 CPUs, 40 GB (∑=60) Problem User 1 has < 50% of both CPUs and RAM Better off in a separate cluster with 50% of the resources CPU User 1 User 2 100% 50% 0% RAM 43% 57% 28% Say whats the prob

Share Guarantee Every user should get 1/n of at least one resource Intuition: “You shouldn’t be worse off than if you ran your own cluster with 1/n of the resources”

Desirable Fair Sharing Properties Many desirable properties Share Guarantee Strategy proofness Envy-freeness Pareto efficiency Single-resource fairness Bottleneck fairness Population monotonicity Resource monotonicity DRF focuses on these properties

Cheating the Scheduler Some users will game the system to get more resources Real-life examples A cloud provider had quotas on map and reduce slots Some users found out that the map-quota was low Users implemented maps in the reduce slots! A search company provided dedicated machines to users that could ensure certain level of utilization (e.g. 80%) Users used busy-loops to inflate utilization

Two Important Properties Strategy-proofness A user should not be able to increase her allocation by lying about her demand vector Intuition: Users are incentivized to make truthful resource requirements Envy-freeness No user would ever strictly prefer another user’s lot in an allocation Don’t want to trade places with any other user

Challenge A fair sharing policy that provides Strategy-proofness Share guarantee Max-min fairness for a single resource had these properties Generalize max-min fairness to multiple resources

Dominant Resource Fairness A user’s dominant resource is the resource she has the biggest share of Example: Total resources: <10 CPU, 4 GB> User 1’s allocation: <2 CPU, 1 GB> Dominant resource is memory as 1/4 > 2/10 (1/5) A user’s dominant share is the fraction of the dominant resource she is allocated User 1’s dominant share is 25% (1/4)

Dominant Resource Fairness (2) Apply max-min fairness to dominant shares Equalize the dominant share of the users Example: Total resources: <9 CPU, 18 GB> User 1 demand: <1 CPU, 4 GB> dominant res: mem User 2 demand: <3 CPU, 1 GB> dominant res: CPU User 1 User 2 100% 50% 0% CPU (9 total) mem (18 total) 3 CPUs 12 GB 6 CPUs 2 GB 66%

DRF is Fair DRF is strategy-proof DRF satisfies the share guarantee DRF allocations are envy-free See DRF paper for proofs

Online DRF Scheduler Whenever there are available resources and tasks to run: Schedule a task to the user with smallest dominant share O(log n) time per decision using binary heaps Need to determine demand vectors

Alternative: Use an Economic Model Approach Set prices for each good Let users buy what they want How do we determine the right prices for different goods? Let the market determine the prices Competitive Equilibrium from Equal Incomes (CEEI) Give each user 1/n of every resource Let users trade in a perfectly competitive market Not strategy-proof!

Determining Demand Vectors They can be measured Look at actual resource consumption of a user They can be provided the by user What is done today In both cases, strategy-proofness incentivizes user to consume resources wisely

DRF vs CEEI User 1: <1 CPU, 4 GB> User 2: <3 CPU, 1 GB> DRF more fair, CEEI better utilization User 1: <1 CPU, 4 GB> User 2: <3 CPU, 2 GB> User 2 increased her share of both CPU and memory CPU mem user 2 user 1 100% 50% 0% Dominant Resource Fairness Competitive Equilibrium from Equal Incomes 66% 55% 91% CPU mem 100% 50% 0% Dominant Resource Fairness Competitive Equilibrium from Equal Incomes 66% 60% 80% Nash: .49948 u1:.5161 u2:0.9677 --

Gaming Utilization-Optimal Schedulers Cluster with <100 CPU, 100 GB> 2 users, each demanding <1 CPU, 2 GB> per task User 1 lies and demands <2 CPU, 2 GB> Utilization-Optimal scheduler prefers user 1 User 1 User 2 CPU 100% 50% 0% mem 100% 50% 0% CPU mem 50% 95% 50%

Example of DRF vs Asset vs CEEI Resources <1000 CPUs, 1000 GB> 2 users A: <2 CPU, 3 GB> and B: <5 CPU, 1 GB> User A User B a) DRF b) Asset Fairness CPU Mem 100% 50% 0% c) CEEI DRF <2/5, 3/5> <3/5, 3/25> Asset <12/37, 18/37> <25/37, 5/37> = <0.324, 0.486> <0.675, 0.135> CEEI <0.5,0.75> <0.5,0.1>

Max/min Theorem for DRF A user Ui has a bottleneck resource Rj in an allocation A iff Rj is saturated and all users using Rj have a smaller (or equal) dominant share than Ui Max/min Theorem for DRF An allocation A is max/min fair iff every user has a bottleneck resource

Desirable Fairness Properties (1) Recall max/min fairness from networking Maximize the bandwidth of the minimum flow [Bert92] Progressive filling (PF) algorithm Allocate ε to every flow until some link saturated Freeze allocation of all flows on saturated link and goto 1

Desirable Fairness Properties (2) P1. Pareto Efficiency It should not be possible to allocate more resources to any user without hurting others P2. Single-resource fairness If there is only one resource, it should be allocated according to max/min fairness P3. Bottleneck fairness If all users want most of one resource(s), that resource should be shared according to max/min fairness

Desirable Fairness Properties (3) Assume positive demands (Dij > 0 for all i and j) DRF will allocate same dominant share to all users As soon as PF saturates a resource, allocation frozen

Desirable Fairness Properties (4) P4. Population Monotonicity If a user leaves and relinquishes her resources, no other user’s allocation should get hurt Can happen each time a job finishes CEEI violates population monotonicity DRF satisfies population monotonicity Assuming positive demands Intuitively DRF gives the same dominant share to all users, so all users would be hurt contradicting Pareto efficiency

Properties of Policies Property Asset CEEI DRF Share guarantee ✔ Strategy-proofness Pareto efficiency Envy-freeness Single resource fairness Bottleneck res. fairness Population monotonicity Resource monotonicity

Evaluation Methodology Micro-experiments on EC2 Evaluate DRF’s dynamic behavior when demands change Compare DRF with current Hadoop scheduler Macro-benchmark through simulations Simulate Facebook trace with DRF and current Hadoop scheduler

Dominant shares are equalized DRF Inside Mesos on EC2 Share guarantee: ~70% dominant share Dominant resource is memory is CPU User 1’s Shares User 2’s Shares 2 jobs, 48 node extra-large EC2 on Mesos. 4 CPU and 15 GB ram. 6 minutes, jobs randomly changed demands. First 2 minutes, J1 <1 cpu, 10gb>, J2 <1 cpu, 1gb>. Min 2-4 J1 <2 cpu,4gb> <1cpu,3gb>, 4-6 min, <1cpu ,7gb> <1 cpu,4gb> Dominant shares are equalized Dominant Shares

Fairness in Today’s Datacenters Hadoop Fair Scheduler/capacity/Quincy Each machine consists of k slots (e.g. k=14) Run at most one task per slot Give jobs ”equal” number of slots, i.e., apply max-min fairness to slot-count This is what DRF paper compares against

Experiment: DRF vs Slots Number of Type 1 Jobs Finished Jobs finished Thrashing Number of Type 2 Jobs Finished Low utilization Jobs finished EC2 8 cpu, 7 gb, 2 types of jobs j1 <1 cpu, 0.5gb> and j2 <2cpu, 2 gb>. Thrashing Type 1 jobs <2 CPU, 2 GB> Type 2 jobs <1 CPU, 0.5GB>

Experiment: DRF vs Slots Completion Time of Type 1 Jobs Thrashing Job completion time Completion Time of Type 2 Jobs Low utilization hurts performance Thrashing Job completion time Type 1 job <2 CPU, 2 GB> Type 2 job <1 CPU, 0.5GB>

Reduction in Job Completion Time DRF vs Slots Simulation of 1-week Facebook traces 2000 node cluster for a week. This one is a small excerpt of 3000 seconds Reduction in completion time 100*delta/old_time. This is a week’s worth of trace.

Utilization of DRF vs Slots Simulation of Facebook workload 2000 node cluster for a week. This one is a small excerpt of 3000 seconds alig@cs.berkeley.edu

Summary DRF provides multiple-resource fairness in the presence of heterogeneous demand First generalization of max-min fairness to multiple-resources DRF’s properties Share guarantee, at least 1/n of one resource Strategy-proofness, lying can only hurt you Performs better than current approaches

Is this a good paper? What were the authors’ goals? What about the evaluation/metrics? Did they convince you that this was a good system/approach? Were there any red-flags? What mistakes did they make? Does the system/approach meet the “Test of Time” challenge? How would you review this paper today?