1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg @ Microsoft Research Presenter: Weiyue Xu 22nd ACM Symposium on Operating Systems Principles 22nd ACM Symposium on Operating Systems Principles

2 Credit Modified version of www.sigops.org/sosp/sosp09/slides/quincy/QuincyTe stPage.html www.cs.uiuc.edu/class/sp11/cs525/slides.021711.ppt

3 Outline Introduction Goal of Quincy Baseline: Queue Based Scheduler Flow Based Scheduler: Quincy Evaluation Conclusion

4 Motivation Popularity of data-intensive cluster computing  Fairness More than 50% are small jobs ( less than 30 minutes) Large job should not monopolize the cluster If Job X takes t seconds when it runs exclusively on a cluster, X should take no more than Jt seconds when cluster has J concurrent jobs. (For N computers and J jobs, each job should get at-least N/J computers)  Data locality Large disks directly attached to the computers Network bandwidth is expensive

5 Problem setting & assumptions Homogenous environment Dryad distributed execution platform  Similar with MapReduce, Hadoop  For each job, it contains one “root task” and several “worker tasks”  Tasks are independent of each other

6 For MPI (message-passing) jobs, coarse grain scheduling  Devote a fixed set of computers for a particular job  Static allocation, rarely change the allocation Tasks’ dependencies costly to kill a task No direct-attached storage For Dryad jobs, fine grain resource sharing  Multiplex all computers in the cluster between all jobs  When one task completes, computer resources may be reassigned to another job Independent tasks (less costly to kill a task and restart) Large datasets attached to each computer

7 Example of Coarse Grain Sharing

8 Example of Fine Grain Sharing N/J computers are used by one job at a time but the set in use varies over lifetime

9 Data Locality Data transfer cost depends on the size and location.

10 Goal of Quincy Fairness + data locality N computers, J concurrent jobs  Each job gets at-least N/J computers  With data locality place tasks near data to avoid network bottlenecks  Joint optimization of fairness and data locality  A multi-constrained optimization problem with trade- offs!

11 Cluster Architecture

12 Baseline: Queue Based Scheduler

13 Greedy (G):  Locality is computed by the root task for each worker task by computing the amount of data need to be transferred if computer m is be assigned to the task. (preferred list C m > R l > X)  Without considering fairness Simple Greedy Fairness (GF):  “blocked” job will not be assigned more computers. But pre-existing tasks from now-blocked jobs are allowed to run to completion (Similar as Hadoop Fair Scheduler) Fairness with preemption (GFP)  The over-quota tasks will be killed Baseline: Queue Based Scheduler

14 Flow Based Scheduler: Quincy Main Idea: [Matching = Scheduling]  Construct a graph based on scheduling constraints and cluster architecture  Assign costs to each matching  Finding a min cost flow on the graph is equivalent to finding a feasible schedule  Each task is either scheduled on a computer or it remains unscheduled  Fairness constrains the number of tasks scheduled for each job

15 New Goal Minimize matching cost while obeying fairness constraints  Instead of making local decisions [queue based], solve it globally Issues:  How to construct the graph?  How to embed fairness and locality constraints in the graph?

16 Graph Construction Start with a directed graph representation of the cluster architecture

17 Graph Construction (2) Add an unscheduled node U j Each worker task has an edge to U j There is a single edge from U j to the sink High cost on edges from tasks to U j. The cost and flow on the edge from U j to the sink controls fairness Fairness controlled by adjusting the number of tasks allowed for each job

18 Graph Construction (3) Add edges from tasks (T) to computers (C), racks (R), and the cluster (X) Control over data locality  cost(T-C) << cost(T-R) << cost(T-X) 0 cost edge from root task to computer to avoid preempting root task

19 A Feasible Matching Cost of T-U edge increases over time New cost assigned to scheduled T-C edge: increases over time

20 Final Graph

21 Evaluation Typical Dryad jobs (Sort, Join, PageRank, WordCount, Prime) Prime used as a worst-case job that hogs the cluster if started first 240 computers in cluster. 8 racks, 29-31 computers per rack More than one metric used for evaluation

22 Experiments

23 Experiments (2)

24 Experiments (3)

25 Experiments (4)

26 Experiments (5)

27 Makespan when network is bottleneck(s)

28 Data Transfer (TB)

29 Conclusion New computational model for data intensive computing Elegant mapping of scheduling to min-cost flow/matching problem

30 Discussion Homogenous environment Centralized Quincy controller: single point of failure. No theoretical stability guarantee. Cost measure: fairness, cost of kill

31 Questions or Comments? Thanks!

1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

Similar presentations

Presentation on theme: "1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

Similar presentations

Presentation on theme: "1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg."— Presentation transcript:

Similar presentations

About project

Feedback