Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University.

Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University

JAQL Roadmap Call to action to improve automatic optimization techniques in MapReduce frameworks Challenges & promising directions Hadoop HDFS PigHive …

Lifecycle of a MapReduce Job Map function Reduce function Run this program as a MapReduce job

Map Wave 1 Reduce Wave 1 Map Wave 2 Reduce Wave 2 Input Splits Lifecycle of a MapReduce Job Time How are the number of splits, number of map and reduce tasks, memory allocation to tasks, etc., determined?

Job Configuration Parameters 190+ parameters in Hadoop Set manually or defaults are used Are defaults or rules-of- thumb good enough?

Running time (seconds) Experiments On EC2 and local clusters Running time (seconds) Running time (minutes)

Performance at default and rule-of-thumb settings can be poor Cross-parameter interactions are significant Illustrative Result: 50GB Terasort 17-node cluster, 64+32 concurrent map+reduce slots mapred.reduce. tasks io.sort. factor io.sort.record. percent 10 0.15 Running time 105000.15 28100.15 300100.15 3005000.15 Based on popular rule-of- thumb

Problem Space Current approaches: Predominantly manual Post-mortem analysis Job configuration parameters Declarative HiveQL/Pig operations Multi-job workflows Performance objectives Cost in pay-as-you-go environment Energy considerations Complexity Space of execution choices Is this where we want to be?

Good planGood setting of parameters Can DB Query Optimization Technology Help? But: – MapReduce jobs are not declarative – No schema about the data – Impact of concurrent jobs & scheduling? – Space of parameters is huge Optimizer: Enumerate Cost Search Query Database Execution Engine MapReduce job Hadoop Results Can we: – Borrow/adapt ideas from the wide spectrum of query optimizers that have been developed over the years Or innovate! – Exploit design & usage properties of MapReduce frameworks

Spectrum of Query Optimizers Conventional Optimizers Rule- based Cost models + statistics about data AT’s Conjecture: Rule-based Optimizers (RBOs) will trump Cost-based Optimizers (CBOs) in MapReduce frameworks Insight: Predictability(RBO) >> Predictability(CBO)

Spectrum of Query Optimizers Conventional Optimizers Rule- based Cost models + statistics about data AT’s Conjecture: Rule-based Optimizers (RBOs) will trump Cost-based Optimizers (CBOs) in MapReduce frameworks Insight: Predictability(RBO) >> Predictability(CBO) Learning Optimizers (learn from execution & adapt) Tuning Optimizers (proactively try different plans)

Spectrum of Query Optimizers Conventional Optimizers Rule- based Cost models + statistics about data Learning Optimizers (learn from execution & adapt) Exploit usage & design properties of MapReduce frameworks: High ratio of repeated jobs to new jobs Schema can be learned (e.g., Pig scripts) Common sort-partition-merge skeleton Mechanisms for adaptation stemming from design for robustness (speculative execution, storing intermediate results) Fine-grained and pluggable scheduler Tuning Optimizers (proactively try different plans)

Summary Call to action to improve automatic optimization techniques in MapReduce frameworks – Automated generation of optimized Hadoop configuration parameter settings, HiveQL/Pig/JAQL query plans, etc. – Rich history to learn from – MapReduce execution creates unique opportunities/challenges

Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University.

Similar presentations

Presentation on theme: "Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University.

Similar presentations

Presentation on theme: "Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University."— Presentation transcript:

Similar presentations

About project

Feedback