Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matchmaking: A New MapReduce Scheduling Technique

Similar presentations


Presentation on theme: "Matchmaking: A New MapReduce Scheduling Technique"— Presentation transcript:

1 Matchmaking: A New MapReduce Scheduling Technique
Chen He Dr. Ying Lu Dr. David Swanson

2 Problem Statement MapReduce cluster scheduling algorithm becomes increasingly important Efficient MapReduce scheduler must avoid unnecessary data transmission We will focus on decreasing data transmission in a MapReduce cluster

3 Contributions Build a matchmaking algorithm to improve data locality of Hadoop MapReduce jobs MatchMaking algorithm lead to higher data locality rate and shorter map task response time We substitute Delay algorithm with MatchMaking algorithm in Fair-sharing scheduler and also obtain better performance

4 Outline Background Delay Algorithm MatchMaking algorithm Evaluation
Conclusion Questions

5 Background Hadoop FIFO scheduler
Scheduler searches local tasks in the first job and assign them If no local task in the first job, a non-local task of the first job will be assigned Strict FIFO job order is followed

6 Background Hadoop FIFO scheduler

7 Background Hadoop FIFO scheduler

8 Background Hadoop FIFO scheduler

9 Background Hadoop FIFO scheduler

10 Background Hadoop FIFO scheduler

11 Background Hadoop FIFO scheduler deficiencies
On the node side, strict FIFO job order reduces data locality On the job side, FIFO can not provide a fair opportunity for each worker node

12 Delay Algorithm Driven by Facebook events log saved in their Hadoop data warehouse Hadoop default FIFO scheduler results in unnecessarily long job response time and lack of fairness in resource sharing Focus on two points: fair sharing and data locality

13 Delay Algorithm Workload*
Bin #Maps %Jobs at Facebook #Maps in Benchmark # of jobs in Benchmark 1 39% 38 2 16% 16 3 3-20 14% 10 14 4 21-60 9% 50 8 5 61-150 6% 100 6 200 7 4% 400 800 9 >1501 3% 4800 *Matei Zaharia et al “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling”

14 Delay Algorithm Fairness: Data locality
Task execution percentage between jobs groups users Data locality For Map stage, a map task is running on a node that contains its input data For Reduce stage?

15 Fairness VS. Data locality
Delay Scheduling Fairness VS. Data locality

16 Delay Algorithm Fair-sharing principle-hierarchical principle

17 Delay Scheduling-including rack locality

18 Delay Algorithm Relax the strict job order
Scheduler can search other jobs in the job queue to find a local task Maximum Delay Time (MDT) for a job to avoid starvation MDT is a user defined maximum time that the scheduler can delay a job from assigning its non-local map tasks

19 Delay Algorithm

20 Delay Algorithm

21 Delay Algorithm

22 Delay algorithm

23 Delay algorithm

24 Delay algorithm

25 Delay Algorithm Properties
MDT decides data locality rate Rl is an increasing function of MDT but with a ceiling value “1” However, average response time

26 Delay Algorithm Deficiency
To achieve best response time, we need to vary the MDT value different types of jobs different cluster sizes different job execution orders

27 Outline Background Delay Algorithm MatchMaking algorithm Evaluation
Conclusion Questions

28 MatchMaking Algorithm
Relax strict job order search all jobs in the queue for local tasks To give every node a fair chance to grab its local tasks when a node fails to find a local task for the first time in a row, no non-local task will be assigned to it when a node fails to find a local task for the second time in a row, a non-local task will be assigned to it A node can be assigned at most one non-local task in every heartbeat interval

29 MatchMaking Algorithm

30 MatchMaking Algorithm

31 MatchMaking Algorithm

32 MatchMaking Algorithm

33 MatchMaking Algorithm

34 MatchMaking Algorithm

35 Outline Background Delay Algorithm MatchMaking algorithm Evaluation
Conclusion Questions

36 Evaluation Environment Test cases Metrics Hardware
1 head node with 2 AMD Optron 2.2GHz 64bit, 8GB Mem, 1Gbps Ethernet 30 worker nodes with same CPUs and network but 4GB Mem Software Hadoop 0.21 Redhat Linux CentOS 5.5 Test cases Loadgen Wordcount Metrics Locality Rate Average Response Time

37 Evaluation Hadoop Configuration HDFS MapReduce
Block size is128MB 100 Blocks evenly distributed in 30 worker nodes Replication number is 2 MapReduce 2 map slots and 1 reduce slot for each worker node Facebook production workload* *Matei Zaharia et al “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling”

38 Evaluation FIFO Scheduler Fair-sharing Scheduler
Default locality policy Delay policy Matchmaking policy Fair-sharing Scheduler

39 Evaluation FIFO scheduler locality rate loadgen wordcount

40 Evaluation FIFO scheduler MTART loadgen wordcount

41 Evaluation Fair sharing scheduler locality rate

42 Evaluation Fair sharing scheduler response time

43 Conclusion We create MatchMaking algorithm to improve MapReduce scheduler’s data locality without tuning It obtains good performance in a middle size cluster with Facebook production workload It can be easily integrated with other scheduler like FIFO or Fair-sharing scheduler

44 Disscussion Data locality in the Reduce stage

45 Discussion Performance in a large cluster and uneven distributed environment Large cluster may have long hearbeat interval Large block size ResponseTime=QueuingTime+DataLoadingTime+DataProcessTime More replicas Data blocks may not be evenly distributed Hotspot

46 Discussion If the job queue is very long.
Set a parameter MaxJobConsidered Priorities

47 Discussion Anything else?

48 Questions Back Page This picture is adopted from the Internet


Download ppt "Matchmaking: A New MapReduce Scheduling Technique"

Similar presentations


Ads by Google