Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek Abdelzaher University of Illinois at Urbana Champaign 1.

Similar presentations


Presentation on theme: "The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek Abdelzaher University of Illinois at Urbana Champaign 1."— Presentation transcript:

1 The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek Abdelzaher University of Illinois at Urbana Champaign 1

2 2 Generalized Parallel TasksParallel Tasks A.K.A. Pipeline ModelA.K.A. Workflow Model

3 Significance Underlying Independent Scheduler τ1τ1 τ2τ2 τnτn Packing Server 1 Packing Server 2 Packing Server n … τ1τ1 τ2τ2 τnτn … 3 1.the main contribution is the notion of a packing server. 2. Packing servers allow graphs of tasks with precedence constraints be converted to a set of budgets treated by the underlying scheduler as independent. -This is achieved thanks to the app- level scheduler of workload inside the server 3. As a result, we are able to convert bounds from independent tasks into equivalent bounds for parallel tasks. 4. This leads to the notion of conversion bound. 5. Using this approach, we come up with bounds for parallel task models that beat the best known ones. 6. We apply to MapReduce Utilization Bound Utilization Bound Utilization Bound App-Sched

4 4 Independent vs Parallel Tasks G-EDF G-RM Federated EDF-FF EDF-FFD RM-ST EDZL 38.2% 50% 26.8% In MapReduce applications: m >> 1, D >> L 80% 63% 80% - 40% - - - - max util. of any task: assume β=5 Comparing Utilization Bound [Li et al. ECRTC’14][Davis et al. ACM Computing Surveys’11]

5 5 The Conversion Bound Independent Task Set Utilization Bound Parallel Task Set utilization Bound The stretch: deadline over critical path length ● The reverse of maximum task utilization ●

6 6 An Example of φ=30, β=5 G-EDF G-RM Federated EDF-FF EDF-FFD RM-ST EDZL 38.2% 50% 26.8% 80% 63% 80% - 40% - - - - IndependentParallel Interdependent Using Conversion - 80%67% 40%33% 80%67% 80%67% 80%67% 63%52.5%

7 Construct a Packing Server for a Pipeline Two questions: 2. What is the conversion bound when using this technique? 1. How to schedule the pipeline in its budgets? 7 DiDi DiDi Pack to min parallelism without violating deadline

8 Before PackingAfter Packing The App-Scheduler t 1. Find the time instance t such that the accumulative execution time before t equations the total budget size of the first phase. 2. Schedule each phase in its corresponding budget portions using the best-fit-like algorithm. 3. For each phase, process one segment at a time. Lay each segment into the budget portions from right to left, starting from the smallest budget portion. Skip any parallelism conflict. Budget portions 4. This algorithm guarantees to schedule every phase in its own budget portions using a simulation of the D max time ahead. Please refer to the paper for more details. 8

9 Lower bound of total WECT: ● # of virtual segments in phase j as Task τ i -utilization (u i ) Phase j -# of segments (m ) -WCET (c ) jiji jiji 9 The conversion bound: ● The Conversion Bound for M-R Pipelines Workflow Job i -deadline (D i ) -crit. path len (L i ) -Stretch ( φ i ) -budget utilization bound (1/β) -# of segments (m i )

10 Transform Workflow into Pipeline m = 3 c = 7 2121 2121 m = 2 c = 5 1111 1111 m = 6 c = 5 5151 5151 m = 4 c = 3 3131 3131 m = 3 c = 3 4141 4141 m = 2 c = 2 6161 6161 t 05101520 ● Introducing no computational penalty● Respecting dependencies ● Preserving critical path length 10

11 Summary 11 2. The app-scheduler schedules pipeline into budgets using underlying-scheduler simulations t DiDi DiDi t 05101520 1. The packing operation packs a pipeline into minimum parallelism 3. Prove conversion bound by analyzing the upper bound of the amount of introduced virtual execution time. 4. Translate workflow into pipeline without introducing virtual computation overhead or lengthening critical path length

12 Evaluation: Algorithms Packing server uses EDF First-Fit as the underlying scheduler. Independent tasks are partitioned into the first resource slot that does not violate 100% utilization bound. Packing server uses GEDF as the underlying scheduler. GEDF assigns the highest priority to the job with the most urgent deadline. The workflow with the most urgent deadline gets the highest priority. Each high-utilization task (u ≥ 1) is assigned a set of dedicated cores and the remaining low- utilization tasks share the remaining cores. 1. Packing & EDF-FF 2. Packing & GEDF 3. GEDF 4. Federated 12

13 Evaluation: Compute β Packing & EDF-FF Packing & GEDF By taking the derivative with respect to β, the highest utilization bound can be achieved at: Similarly: 13

14 Evaluation: Accepted Utilization Workflows are generated based on Yahoo! WebScope data. Set φ =20, m = 500 (small granularity) Compute β = 3.58 for Packing & EDF-FF β = 4.47 for Packing & GEDF Theoretical utilization bounds: Packing & EDF-FF: 64% Packing & GEDF: 60.3% Federated: 50% [Li et al. ECRTC’14] GEDF: 38.2% [Li et al. ECRTC’14] Domino effect 14

15 Evaluation: Accepted Utilization Workflows are generated based on Yahoo! WebScope data. Set φ =30, m = 500 (small granularity) Compute β = 4.56 for Packing & EDF-FF β = 5.47 for Packing & GEDF Theoretical utilization bounds: Packing & EDF-FF: 70% Packing & GEDF: 66.9% Federated: 50% [Li et al. ECRTC’14] GEDF: 38.2% [Li et al. ECRTC’14] Domino effect 15

16 Evaluation: Admission Control Workflows are generated based on Yahoo! WebScope data. Implemented a prototype on WOHA [Li et al., ICDCS’14], a variant of Hadoop Submitted a set of tasks with a total utilization above 100% 16 Admission control is enforced at the theoretical utilization bound. Set φ =20, m = 160 (small granularity)

17 Thank You!Q & A 17

18 The Conversion Bound for M-R Pipelines To cap the max util. of resulting tasks: ● Together, we have: ● This is a subset of phases Moreover: ● phases need to be packed (big) phases need virtual segments (small) Task τ i -utilization (u i ) Workflow Job i -deadline (D i ) -crit. path len (L i ) -Stretch ( φ i ) -budget utilization bound (1/β) Phase j -# of segments (m ) -WCET (c ) jiji jiji Find the minimum concurrency m i such that converted budgets do not violate the deadline. Then, we have: ● 18

19 The Packing Server: straightforward strategy Budgets The Problem: It introduces too much virtual computational overhead. Consider a MapReduce workflow of two phases: 19 is bad

20 The Packing Server: fit into Hadoop Schedule D max Container Input Task Set τ AM1AM2AM3 AM: Application Master RM RM: Resource Manager Container Container: execute segment Budget Schedule 20

21 21 The Story of Aperiodic Task Servers Aperiodic tasks are difficult to analyze. ● There exists a rich set of techniques to analyze periodic tasks. ● Researchers proposed the concept of aperiodic task servers. ● t 05101520


Download ppt "The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek Abdelzaher University of Illinois at Urbana Champaign 1."

Similar presentations


Ads by Google