Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Map-reduce to Support MPMD Peng

Similar presentations

Presentation on theme: "Using Map-reduce to Support MPMD Peng"— Presentation transcript:

1 Using Map-reduce to Support MPMD Peng ( Yuan(

2 Our Motivation The default job scheduler in Hadoop has a first-in-first-out queue of jobs for each priority level. The scheduler always assigns task slots to the first job in the highest-level priority queue that is in need of tasks. Problems: – Difficult to share a MapReduce cluster between users (Multi- tasks) – Difficult to implement a composite tasks having more that one jobs with inter-dependency. A strong motivation to improve the Hadoop framework to – support Multi-tasks – support Composite-tasks

3 Multi-tasks Problem One solution to this problem is to create separate MapReduce clusters for different user groups with Hadoop On-Demand, but this hurts system utilization because a group's cluster may be mostly idle for long periods of time. Advanced solution: – Facebook Fair Scheduler – Yahoo Capacity Scheduler

4 Facebook Fair Scheduler Jobs are placed into named “pools”. Each pool can have a “guaranteed capacity” that is specified through a config file, which gives a minimum number of map slots and reduce slots to allocate to the pool. When there are pending jobs in the pool, it gets at least this many slots, but if it has no jobs, the slots can be used by other pools. Excess capacity that is not going toward a pool’s minimum is allocated between jobs using fair sharing. – Fair sharing splits up compute time proportionally between jobs that have been submitted, emulating an "ideal" scheduler that gives each job 1/Nth of the available capacity.

5 Yahoo Capacity Scheduler Define a number of named queues. Each queue has a configurable number of map and reduce slots. The scheduler gives each queue its capacity when it contains jobs, and shares any unused capacity between the queues. However, within each queue, FIFO scheduling with priorities is used, except for one aspect – you can place a limit on percent of running tasks per user, so that users share a cluster equally.

6 There is still a Problem! Both Yahoo and Facebook’s scheduler assigns dedicated map and reduce slots to those tasks, they are not in compliance with “Moving computation to data” Out solution: – Turning Hadoop into MPMD (computation resource sharing): Different users can submit multiple tasks which will be assigned to different mappers/reducers and run simultaneously. Load balancing achieved by keeping the computing nodes busy with tasks

7 Using the traditional Map-reduce to support MPMD Data 1 Data 2 Data 3 …… Data n RunnerMap …… RunnerMap Output 1 Output 2 …… Output n Output Lookup the code for Data RunnerReduce MapProcedure ReduceProcedure

8 Running WordCount and Hadoop Blast using extended framework WordCountMapP rocedure Extends Abstract class MapProcedure WordCountReduc eProcedure Extends Abstract class ReduceProcedure BlastMapProcedu re Extends Abstract class MapProcedure RunnerMap RunnerReduce blast_input_1.fa:edu.indiana.cs.b649.BlastMa pProcedure wordcount_input_1.txt:edu.indiana.cs.b649. WordCountMapProcedure

9 Composite task problem To support a composite task having more that one jobs with inter-dependency.

10 Support Composite-tasks in out out0 Map Reduce File 1 File 3 File 4 File 2 empty_file Part-r- 00000 blast_intp ut_0.fa blast_intp ut_1.fa Map emtpy_fil e.out

11 Demo Running Hadoop Blast + Advanced WordCount – Single node mode: 2 mappers + 2 reducers – Input files: blast_input_0.fa blast_input_1.fa wordcount_input_0.txt wordcount_input_1.txt empty_file – Output files: blast_input_0.fa.out blast_input_1.fa.out empty_file.out

12 Performance Test Task execution time (ms) = job launching time + job execute time

13 Roles of team member Peng – Implemented the framework to support Multi- tasks Yuan – Improved the framework to support Composite- tasks

14 Q&A Thanks!

Download ppt "Using Map-reduce to Support MPMD Peng"

Similar presentations

Ads by Google