Cloud Computing MapReduce in Heterogeneous Environments

Slides:

Advertisements

Similar presentations

Cloud Service Models and Performance Ang Li 09/13/2010.

Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

MapReduce Online Veli Hasanov Fatih University.

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

SkewTune: Mitigating Skew in MapReduce Applications

UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.

Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Jiong Xie Ph.D. Student April 2010.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.

Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on.

Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End) 

Survey on Programming and Tasking in Cloud Computing Environments PhD Qualifying Exam Zhiqiang Ma Supervisor: Lin Gu Feb. 18, 2011.

MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)

Introduction to Hadoop and HDFS

Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.

BALANCED DATA LAYOUT IN HADOOP CPS 216 Kyungmin (Jason) Lee Ke (Jessie) Xu Weiping Zhang.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.

Big Data is a Big Deal!.

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Managing Data Transfer in Computer Clusters with Orchestra

Edinburgh Napier University

CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

PA an Coordinated Memory Caching for Parallel Jobs

MapReduce Simplied Data Processing on Large Clusters

Cloud Distributed Computing Environment Hadoop

湖南大学-信息科学与工程学院-计算机与科学系

Omega: flexible, scalable schedulers for large compute clusters

February 26th – Map/Reduce

Cse 344 May 4th – Map/Reduce.

Ch 4. The Evolution of Analytic Scalability

COS 518: Advanced Computer Systems Lecture 14 Michael Freedman

Cloud Computing MapReduce, Batch Processing

Introduction to MapReduce

Cloud Computing Large-scale Resource Management

COS 518: Distributed Systems Lecture 11 Mike Freedman

MapReduce: Simplified Data Processing on Large Clusters

Distributed Systems and Concurrency: Map Reduce

Presentation transcript:

Cloud Computing MapReduce in Heterogeneous Environments Eva Kalyvianaki ek264@cam.ac.uk

Contents Looking at MapReduce performance in heterogeneous clusters Material is from the paper: “Improving MapReduce Performance in Heterogeneous Environments”, By Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica, published in Usenix OSDI conference, 2008 and their presentation at OSDI 2

Motivation: MapReduce is becoming popular Open-source implementation, Hadoop, used by Yahoo!, Facebook, Last.fm, … Scale: 20 PB/day at Google, O(10,000) nodes at Yahoo, 3000 jobs/day at Facebook 3

Stragglers in MapReduce Straggler is a node that performs poorly or not performing at all. Original MapReduce mitigation approach was: To run a speculative copy (called a backup task) Whichever copy or original would finish first would be included Without speculative execution, a job would be slow as the slowest sub-task Google notes that speculative execution can improve job response times by 44% Is this approach good enough for modern clusters? 4

Modern Clusters: Heterogeneity is the norm Cloud computing providers like Amazon’s Elastic Compute Cloud (EC2) provide cheap on-demand computing: Price: 2 cents / VM / hour Scale: thousands of VMs Caveat: less control of performance Main challenge for Hadoop on EC2 is performance heterogeneity, which breaks task scheduler assumptions This lecture/paper is on a new LATE scheduler that can cut response time in half 5

MapReduce Revised 6

MapReduce Implementation, Hadoop 7

Scheduling in MapReduce When a node has an empty slot, Hadoop chooses one from the three categories in the following priority: A failed task is given higher priority Unscheduled tasks. For maps, tasks with local data to the node are chosen first. Looks to run a speculative task. 8

Deciding on Speculative Tasks Which task to execute speculatively? Hadoop monitors tasks progress using a progress score: a number from 0, …, 1 For mappers: the score is the fraction of input data read For reducers: the execution is divided into three equal phases, 1/3 of the score each: Copy phase: percent of maps that output has been copied from Sort phase: map outputs are sorted by key: percent of data merged Reduce phase: percent of data passed through the reduce function Example: a task halfway through the copy phase has progress score = 1/2*1/3 = 1/6. Example: a task halfway through the reduce phase has progress score = 1/3 + 1/3 + 1/2 * 1/3 = 5/6 9

Deciding on Speculative Tasks (con’t) Hadoop looks at the average progress of each category of maps and reduces and defines a threshold: When a task’s progress is less than the average for its category minus 0.2, and the task has run at least one minute, it is marked as a straggler: threshold = avgProgress – 0.2 All tasks with progress score < threshold are stragglers Ties are broken by data locality This approach works reasonably well in homogeneous clusters 10

Scheduler’s Assumptions Nodes can perform work at roughly the same rate Tasks progress at constant rate all the time There is no cost to starting a speculative task A task’s progress is roughly equal to the fraction of its total work Tasks tend to finish in waves, so a task with a low progress score is likely a slow task Different task of the same category (maps or reduces) take roughly the same amount of work 11

Revising Scheduler’s Assumptions Nodes can perform work at roughly the same rate Tasks progress at constant rate all the time (1) In heterogeneous clusters some nodes are slower (older) than others (2) Virtualized clusters “suffer” from co-location interference 12

Heterogeneity in Virtualized Environments VM technology isolates CPU and memory, but disk and network are shared Full bandwidth when no contention Equal shares when there is contention 2.5x performance difference Combine high utilization for provider with fast finishing time for client 13

Revising Scheduler’s Assumptions There is no cost to starting a speculative task A task’s progress is roughly equal to the fraction of its total work Tasks tend to finish in waves, so a task with a low progress score is likely a slow task (3) Too many speculative tasks can take away resources from other running tasks (4) The copy phase of reducers is the slowest part, because it involves all-pairs communications. But this phase counts for 1/3 of the total reduce work. (5) Tasks from different generations will be executed concurrently. So newer faster tasks are considered with older show tasks, avgProgress changes a lot. 14

Idea: Progress Rates Instead of using progress score values, compute progress rates, and back up tasks that are “far enough” below the mean Problem: can still select the wrong tasks How is this different from progress score approach? 15

Progress Rate Example Node 1 1 task/min Node 2 3x slower 1.9x slower Slow down, add animation of speculative task helping Time (min) 16

Node 2 is slowest, but should back up Node 3’s task! Progress Rate Example What if the job had 5 tasks? 2 min Node 1 Node 2 time left: 1 min Node 3 time left: 1.8 min Animate selecting wrong task Time (min) Node 2 is slowest, but should back up Node 3’s task! 17

Our Scheduler: LATE Insight: back up the task with the largest estimated finish time “Longest Approximate Time to End”  LATE Look forward instead of looking backward Sanity thresholds: Cap number of backup tasks Launch backups on fast nodes Only back up tasks that are sufficiently slow 18

LATE Details Estimating finish times: progress score progress rate = execution time progress rate = 1 – progress score progress rate estimated time left = Sensitivity showed that these were good thresholds, and that there is also some flexibility in the range Progress rate = 0.8/30 0.02 0.8/20=0.04 Time left = (1 – 0.8)/0.02 = 10 0.8 30 0.1 X =30/0.8 19

LATE Scheduler If a task slot becomes available and there are less than SpeculativeCap tasks running, then: Ignore the request if the node’s total progress is below SlowNodeThreshold (=25th percentile) Rank currently running, non-speculatively executed tasks by estimated time left Launch a copy of the highest-ranked task with progress rate below SlowTaskThreshold (=25th percentile) Threshold values: 10% cap on backups, 25th percentiles for slow node/task Validated by sensitivity analysis 20

LATE Example LATE correctly picks Node 3 Node 1 Node 2 Node 3 2 min Node 1 Estimated time left: (1-0.66) / (1/3) = 1 Node 2 Estimated time left: (1-0.05) / (1/1.9) = 1.8 Progress = 66% Progress = 5.3% Node 3 Show LATE working Time (min) LATE correctly picks Node 3 21

Evaluation Environments: Self-contention through VM placement EC2 (3 job types, 200-250 nodes) Small local testbed Self-contention through VM placement Stragglers through background processes 22

EC2 Sort without Stragglers (Sec 5.2.1) 106 machines , 7-8 VMs per machine  total of 243 VMs 128 MB data per host, 30 GB in total 486 map tasks and 437 reduce tasks average 27% speedup over native, 31% over no backups 23

EC2 Sort with Stragglers (Sec 5.2.2) 8 VMs are manually slowed down out of 100 VMs in total running background of CPU- and disk-intensive jobs average 58% speedup over native, 220% over no backups 93% max speedup over native 8 VM are manually slowed down out of 100 VM in total Response times achieved 256 MB of data per host or 25 GB of total data Each job 486 map tasks and 437 reduce tasks 24

Conclusion Heterogeneity is a challenge for parallel apps, and is growing more important Lessons: Back up tasks which hurt response time most 2x improvement using simple algorithm Mention focus on finishing time vs just reliability 25

Summary MapReduce is a very powerful and expressive model Performance depends a lot on implementation details Material is from the paper: “Improving MapReduce Performance in Heterogeneous Environments”, By Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica, published in Usenix OSDI conference, 2008 and their presentation at OSDI 26