Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Similar presentations


Presentation on theme: "Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,"— Presentation transcript:

1 Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu Department of Computer Science Duke University

2 The Growth of Data

3 MAD: Features of Ideal Analytics System M agnetism A gility D epth -- accept all data -- allow complex analysis -- adapt with data, real-time processing

4 M agnetism A gility D epth -- accept all data -- adapt with data, real-time processing -- allow complex analysis Hadoop is MAD - Blindly loads data into HDFS. - Fine-grained scheduler - End-to-end data pipeline - Dynamic node addition/ dropping - Well integrated with programming languages

5 Tuning for Good Performance: Challenges - Multiple dimensions of performance -- time, cost, scalability … - Tons of Parameters -- more than 190 parameters in Hadoop. - Multiple levels of abstraction -- job-level, workflow-level, workload-level …

6 Thumb rule Tuning for Good Performance: Challenges

7 Thumb rule Tuning for Good Performance: Challenges

8 Starfish: A Self-tuning System - Builds on Hadoop - Tunes to ‘good’ performance automatically

9 Starfish Architecture

10 The “What-if” Engine Model + simulation based prediction algo. Predicted performance Learning from previous job profiles Analytical models to estimate dataflow Simulating the execution of MR workload Profile of a job (P) + New parameter set (S) [Ref:] A What-if Engine for Cost-based MapReduce Optimization. H. Herodotou et.al.

11 The “What-if” Engine Ground truthEstimated by the What-if engine

12 Starfish Architecture: Job Level

13 Just-in-time optimizer -- Searches the parameter space Profiler -- Collects info. on MapReduce job execution through dynamic instrumentation -- Reports timings, data size, and resource utilization Sampler -- Generates profile statistics from training benchmark jobs

14 Starfish Architecture: Workflow Level

15 Scheduler to balanced distribution of data Block placement policy for data collocation -- deals with skewed data, add/drop of nodes, tradeoff between balanced data v/s data-locality -- Local-write v/s round-robin

16 Starfish Architecture: Workflow Level Producer Consumer Wasted production

17 Starfish Architecture: Workflow Level File level parallelism Block level parallelism

18 Starfish Architecture: Workflow Level What-if simulation Workflow Aware Optimizer Select best data layout and job parameters MR job execution Task scheduling Block placement Compare cost & benefits Running time? Data layout?

19 Starfish Architecture: Workload Level

20 Workload Optimizer Elastisizer Determine best cluster and Hadoop configurations Jumbo operator Cost based estimation for best optimization

21 Starfish: Summary - Optimizes on different granularities -- Workload, workflow, job (procedural & declarative) - Considers different decision points -- Provisioning, optimization, Scheduling, Data layout

22 Starfish: Piazza Discussion 1) Limited evaluation: 10 Top criticisms (till 1:30pm, 17 reviews): 2) Not explained well: 7 3) Profiler overhead/better search algo: 5 * What is the effect of wrong prediction? * What-if engine requires prior knowledge.

23 http://www.cs.duke.edu/starfish/ Thank you. Photo courtesy: Starfish group, Duke University

24 Going MAD with Big Data M agnetic system A gile system and Analytics D eep Analytics D ata Life Cycle Awareness E lasticity R obustness

25 Backup: What-if Engine 1

26 Backup: What-if Engine 2


Download ppt "Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,"

Similar presentations


Ads by Google