Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.

Similar presentations


Presentation on theme: "Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca."— Presentation transcript:

1 Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca

2 2 DATA PARALLEL CLUSTERS

3 3

4 4 Predictabi lity

5 5 DATA PARALLEL CLUSTERS Deadli ne

6 6 DATA PARALLEL CLUSTERS Deadli ne

7 7 VARIABLE LATENCY

8 8

9 9

10 10 VARIABLE LATENCY

11 11 VARIABLE LATENCY

12 12 VARIABLE LATENCY

13 13 VARIABLE LATENCY

14 14 VARIABLE LATENCY 4.3x

15 15 Why does latency vary? 1.Pipeline complexity 2.Noisy execution environment

16 Cosmos 16 MICROSOFT’S DATA PARALLEL CLUSTERS

17 Cosmos 17 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore

18 Cosmos 18 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore Dryad

19 Cosmos 19 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore Dryad SCOPE

20 Cosmos 20 MICROSOFT’S DATA PARALLEL CLUSTERS CosmosStore Dryad SCOPE

21 21 DRYAD’S DAG WORKFLOW Cosmos Cluster

22 22 DRYAD’S DAG WORKFLOW Cosmos Cluster

23 23 DRYAD’S DAG WORKFLOW Pipeli ne Job

24 24 DRYAD’S DAG WORKFLOW Deadli ne

25 25 DRYAD’S DAG WORKFLOW Deadli ne

26 26 Sta ge DRYAD’S DAG WORKFLOW Job

27 27 Sta ge DRYAD’S DAG WORKFLOW Tas ks Job

28 28

29 29 EXPRESSING PERFORMANCE TARGETS Priorities?

30 30 EXPRESSING PERFORMANCE TARGETS Priorities? Not expressive enough Weights?

31 31 EXPRESSING PERFORMANCE TARGETS Priorities? Not expressive enough Weights? Difficult for users to set Utility curves?

32 32 EXPRESSING PERFORMANCE TARGETS Priorities? Not expressive enough Weights? Difficult for users to set Utility curves? Capture deadline & penalty

33 33 OUR GOAL

34 34 OUR GOAL Maximize utility

35 35 OUR GOAL Maximize utility while minimizing resources

36 36 OUR GOAL Maximize utility while minimizing resources by dynamically adjusting the allocation

37 Jockey 37

38 Jockey 38 Large clusters

39 Jockey 39 Large clusters Many users

40 Jockey 40 Large clusters Many users Prior execution

41 41 JOCKEY – MODEL f(job state, allocation) -> remaining run time

42 42 JOCKEY – MODEL f(job state, allocation) -> remaining run time

43 43 JOCKEY – MODEL f(job state, allocation) -> remaining run time

44 44 JOCKEY – MODEL f(job state, allocation) -> remaining run time

45 45 JOCKEY – CONTROL LOOP

46 46 JOCKEY – CONTROL LOOP

47 47 JOCKEY – CONTROL LOOP

48 48 JOCKEY – MODEL f(job state, allocation) -> remaining run time

49 49 JOCKEY – MODEL f(progress, allocation) -> remaining run time f(job state, allocation) -> remaining run time

50 50 JOCKEY – PROGRESS INDICATOR

51 51 JOCKEY – PROGRESS INDICATOR

52 52 JOCKEY – PROGRESS INDICATOR

53 53 JOCKEY – PROGRESS INDICATOR total running

54 54 JOCKEY – PROGRESS INDICATOR total running + total queuing

55 55 JOCKEY – PROGRESS INDICATOR sta ge total running + total queuing

56 56 JOCKEY – PROGRESS INDICATOR total running + total queuing total running + total queuing total running + total queuing Stage 1 Stage 2 Stage 3 + +

57 57 JOCKEY – PROGRESS INDICATOR total running + total queuing total running + total queuing total running + total queuing # complet e total tasks # complet e total tasks # complet e total tasks Stage 1 Stage 2 Stage 3 + +

58 58 JOCKEY – PROGRESS INDICATOR

59 59 JOCKEY – PROGRESS INDICATOR

60 60 JOCKEY – PROGRESS INDICATOR

61 61 JOCKEY – PROGRESS INDICATOR

62 62 JOCKEY – PROGRESS INDICATOR

63 63 JOCKEY – PROGRESS INDICATOR

64 64 JOCKEY – CONTROL LOOP

65 65 JOCKEY – CONTROL LOOP 1% complete 2% complete 3% complete 4% complete 5% complete Job model

66 66 JOCKEY – CONTROL LOOP 10 nodes20 nodes30 nodes 1% complete 2% complete 3% complete 4% complete 5% complete Job model

67 67 JOCKEY – CONTROL LOOP 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Job model

68 68 JOCKEY – CONTROL LOOP 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Job model Deadlin e: 50 min. Comple tion: 1%

69 69 JOCKEY – CONTROL LOOP Job model Deadlin e: 50 min. Comple tion: 1% 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes

70 70 JOCKEY – CONTROL LOOP Job model 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Deadlin e: 40 min. Comple tion: 3%

71 71 JOCKEY – CONTROL LOOP Job model 10 nodes20 nodes30 nodes 1% complete60 minutes40 minutes25 minutes 2% complete59 minutes39 minutes24 minutes 3% complete58 minutes37 minutes22 minutes 4% complete56 minutes36 minutes21 minutes 5% complete54 minutes34 minutes20 minutes Deadlin e: 30 min. Comple tion: 5%5%

72 72 JOCKEY – MODEL f(progress, allocation) -> remaining run time

73 73 JOCKEY – MODEL f(progress, allocation) -> remaining run time analytic model?

74 74 JOCKEY – MODEL f(progress, allocation) -> remaining run time analytic model? machine learning?

75 75 JOCKEY – MODEL f(progress, allocation) -> remaining run time analytic model? machine learning? simulator

76 76 JOCKEY ProblemSolution

77 77 JOCKEY ProblemSolution Pipeline complexity

78 78 JOCKEY ProblemSolution Pipeline complexity Use a simulator

79 79 JOCKEY ProblemSolution Pipeline complexity Use a simulator Noisy environment

80 80 JOCKEY ProblemSolution Pipeline complexity Use a simulator Noisy environment Dynamic control

81 Jockey in Action 81

82 Jockey in Action 82 Real job

83 Jockey in Action 83 Real job Production cluster

84 Jockey in Action 84 Real job Production cluster CPU load: ~80%

85 Jockey in Action 85

86 Jockey in Action 86

87 Jockey in Action 87

88 Jockey in Action 88 Initial deadline: 140 minutes

89 Jockey in Action 89 New deadline: 70 minutes

90 Jockey in Action 90 New deadline: 70 minutes Release resources due to excess pessimism

91 Jockey in Action 91 “Oracle” allocation: Total allocation- hours Deadline

92 Jockey in Action 92 “Oracle” allocation: Total allocation- hours Deadline Available parallelism less than allocation

93 Jockey in Action 93 “Oracle” allocation: Total allocation- hours Deadline Allocation above oracle

94 Evaluation 94

95 Evaluation 95 Production cluster

96 Evaluation 96 Production cluster 21 jobs

97 Evaluation 97 Production cluster 21 jobs SLO met?

98 Evaluation 98 Production cluster 21 jobs SLO met? Cluster impact?

99 Evaluation 99

100 Evaluation Jobs which met the SLO 100

101 Evaluation Jobs which met the SLO 101

102 Evaluation Jobs which met the SLO 102 Missed 1 of 94 deadlines

103 Evaluation Jobs which met the SLO 103

104 Evaluation Jobs which met the SLO 104

105 Evaluation Jobs which met the SLO 105 1.4x

106 Evaluation Jobs which met the SLO 106 Allocated too many resources Missed 1 of 94 deadlines

107 Evaluation Jobs which met the SLO Allocated too many resources 107 Simulator made good predictions: 80% finish before deadline Missed 1 of 94 deadlines

108 Evaluation Jobs which met the SLO Allocated too many resources Simulator made good predictions: 80% finish before deadline 108 Control loop is stable and successful Missed 1 of 94 deadlines

109 Evaluation 109

110 Evaluation 110

111 Evaluation 111

112 Evaluation 112

113 Evaluation 113

114 Evaluation 114

115 Conclusion 115

116 116 Data parallel jobs are complex,

117 117 Data parallel jobs are complex, yet users demand deadlines.

118 118 Data parallel jobs are complex, yet users demand deadlines. Jobs run in shared, noisy clusters,

119 119 Data parallel jobs are complex, yet users demand deadlines. Jobs run in shared, noisy clusters, making simple models inaccurate.

120 Jockey 120

121 simulator 121

122 control-loop 122

123 123 Deadli ne

124 124

125 Questions? Andrew Ferguson adf@cs.brown.edu 125

126 Co- authors Peter Bodík (Microsoft Research) Srikanth Kandula (Microsoft Research) Eric Boutín (Microsoft) Rodrigo Fonseca (Brown) Questions? 126 Andrew Ferguson adf@cs.brown.edu

127 Backup Slides 127

128 Utility Curves Deadline For single jobs, scale doesn’t matter For multiple jobs, use financial penalties 128

129 129 Jockey Resource allocation control loop 1. Slack 2. Hysteresis 3. Dead Zone PredictionRun Time Utility

130 130 Cosmos Resources are allocated with a form of fair sharing across business groups and their jobs. (Like Hadoop FairScheduler or CapacityScheduler) Each job is guaranteed a number of tokens as dictated by cluster policy; each running or initializing task uses one token. Token released on task completion. A token is a guaranteed share of CPU and memory To increase efficiency, unused tokens are re-allocated to jobs with available work Resource sharing in Cosmos

131 131 Jockey Progress indicator Can use many features of the job to build a progress indicator Earlier work (ParaTimer) concentrated on fraction of tasks completed Our indicator is very simple, but we found it performs best for Jockey’s needs Total vertex initialization time Total vertex run time Fraction of completed vertices

132 132 Comparison with ARIA ARIA uses analytic models Designed for 3 stages: Map, Shuffle, Reduce Jockey’s control loop is robust due to control-theory improvements ARIA tested on small (66-node) cluster without a network bottleneck We believe Jockey is a better match for production DAG frameworks such as Hive, Pig, etc.

133 133 Jockey Latency prediction: C(p, a) Event-based simulator –Same scheduling logic as actual Job Manager –Captures important features of job progress –Does not model input size variation or speculative re- execution of stragglers –Inputs: job algebra, distributions of task timings, probabilities of failures, allocation Analytic model –Inspired by Amdahl’s Law: T = S + P/N –S is remaining work on critical path, P is all remaining work, N is number of machines

134 134 Jockey Resource allocation control loop Executes in Dryad’s Job Manager Inputs: fraction of completed tasks in each stage, time job has spent running, utility function, precomputed values (for speedup) Output: Number of tokens to allocate Improved with techniques from control-theory

135 Jockey offlineduring job runtime job profile 135

136 Jockey simulator offlineduring job runtime job profile 136

137 Jockey simulator offlineduring job runtime job stats job profile 137

138 Jockey simulator offlineduring job runtime job stats latency predictions job profile 138

139 Jockey simulator offlineduring job runtime utility function job stats latency predictions job profile 139

140 Jockey simulator offlineduring job runtime running job utility function job stats latency predictions resource allocation control loop job profile 140


Download ppt "Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca."

Similar presentations


Ads by Google