CMSC ML for Cluster Scheduling (1)

CMSC 34702 ML for Cluster Scheduling (1)
Junchen Jiang October 3, 2019

Logistics Signup on Piazza Choose your paper to present
Choose your paper to present Paper review format: Paper summary (Three sentences or less about the main idea, approach, or contribution.) Why we should accept the paper? (Please give 1-3 sentences for the 1-3 strongest things about the paper.) Why we should not accept the paper? (Please give 1-3 sentences about the 1-3 things about the paper that would most improve it.)

MapReduce: Simplified Data Processing on Large Clusters CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

Cloud Computing Basics

Scaling up vs. Scaling out: Origin of Cloud Computing
Scale-up: High-end servers Sun Starfire, Enterprise, … ($1 million a piece) Used by eBay, Amazon, … Scale-out: “Commercial Off-The-Shelf” (COTS) computers Many of them (Google had 15,000 of them c. 2004)

Price/Performance Comparison (c. 2004)
High-end server rack 8 x 2GHz Xeon CPUs, 64GB RAM, 8TB Disk $758K Rack of COTS nodes 176 2GHz Xeon CPUs, 176GB RAM, 7.04TB Disk {Dual CPUs, 2GB RAM, 80GB Disk} x 88 $278K Higher performance and cheaper! Too good to be true?

Disadvantages of a cluster of COTS nodes?
High-end server Rack of COTS computers VS CPU RAM Disk

New problems in distributed/cluster computing
Fault tolerance Network traffic Data consistency Programming complexity …

Cluster Computing Needs a Software Stack
Data mngt Processing Database Resource mngt … Typical software analytics stack The Google File System (GFS) MapReduce Bigtable Borg … Google Hadoop File System (HDFS) MapReduce HBase YARN … Hadoop Allexio Spark Shark Mesos … Berkeley

MapReduce: Simplified Data Processing on Large Clusters
Cluster computing is popular, but it’s hard to write complex & high performant programs. The first to provide an expressive programming interface that automatically optimizes low-level system details.

Why is parallelization difficult?
If the initial state is x=6, y=0, what happens when these threads finish running? Thread 1 void foo(){ x ++; y = x; } Thread 2 void bar(){ y ++; x += 3; } Multithreading = Unpredictability (from

Functional Programming
x++ y=x x x y 6 f X A f y++ Y B y y x+=3 x x Functional Programming No mutable variable, No changing state No side effect States can change (not idempotent) Too many variable (interdependency)

Key Functional Programming ops: map & fold
X X’ X f f Y Y’ Y f f Z Z’ Z f f map fold

MapReduce: An instantiation of “map” & “fold”
(key_a, val_11) (key_b, val_12) (key_1, val_1) (key_a, R([val_11])) (key_b, R([val_12,val_21])) (key_c, R([val_22])) (key_b, val_21) (key_c, val_22) (key_2, val_2) Example: Count word occurrences “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

Example: Count word occurrences
map reduce (“personal”, 1) (“computer”, 1) (URL1, “personal computer”) (“personal”, 1) (“computer”, 2) (“science”, 1) (“computer”, 1) (“science”, 1) (URL2, “computer science”) “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

Rationale behind the MapReduce Interface: A Minimalist Approach
Google Search, Machine learning, Graph mining, Grep, Sort, Word Counting… Applications, Data Analytics Algorithms Interface Application developers need to must all the intricacies of resource & comm. Map & Reduce Cluster Computing System MapReduce System Can you think of another example of the minimalist approach?

What’s the contribution of the MapReduce System?
Make it easier to write parallel programs “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

What’s the contribution of the MapReduce System?
Make it easier to write parallel programs An implementation of the interface that achieves high performance Fault tolerance Data locality Load balancing Straggler mitigation Consistency Data integrity “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

System Architecture “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

Performance: Data locality
Co-locate workers with the data Co-locate reducers with mappers “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

Performance: Speeding up “Reducer” with “Combiner”
When can “Combiner” help? “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

Re-execute in-progress and completed map tasks
Fault Tolerance Re-execute in-progress and completed map tasks What if a map worker? “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

What if a reduce worker fails?
Fault Tolerance What if a reduce worker fails? Re-execute in-progress reduce tasks “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

What if the master fails?
Fault Tolerance What if the master fails? Expose to the user “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04

MapReduce Summary A minimalist approach
Many problems can be easily expressible by MapReduce primitives Greatly simplifies fault tolerance & performance optimization (Almost) complete transparent fault tolerance at a large scale Dramatically ease the burden of programmers Still need users to step-in in some cases…

“Hyperparameters” of a cluster/cloud job
How many physical machines? How much RAM, CPUs per machine? How much disk space? How much network bandwidth? …

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics
Cloud performance is sensitive to configurations, but there’s no way to pick configurations optimally, quickly and adaptively for any cloud job The first systematic technique to achieve these requirements through modeling the perf-config relationship via a blackbox ML technique

Large space of cloud configurations
Providers Machine Types Cluster Sizes r3.8xlarge, i2.8xlarge, m4.8xlarge, c4.8xlarge, r4.8xlarge, c3.8xlarge, Amazon AWS X X 10s~ options Microsoft Azure A0, A1, A2, A3, A11, A12, D1, D2, D3, … n1-standard-4, n1-highmem-2, n1-highcpu-4, … Google Cloud “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Good configuration  High performance & Low cost
Application Avg/min Max/min TPC-DS 3.4X 9.6X TPC-H 2.9X 12X Regression (SparkML) 2.6X 5.2X TeraSort 1.6X 3X 66 Configurations “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Complex performance-configuration relationship
“CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

How to find the best cloud configuration
One that minimizes the cost given a performance constraint for a recurring job, given its representative workload? “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Key metrics of success “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Strawmen Exhaustive search Coordinate search Ernest [NSDI’16]
High overhead Coordinate search Optimize each config (CPU, RAM, disk, network, etc) Not accurate (non-convex performance/cost curves across many resources) Ernest [NSDI’16] Learn a model for each job type Not adaptive “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Why CherryPick “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Basic idea: Blackbox modeling
Config-performance model Start with any config Run the config Blackbox modeling Choose next config Return config “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Insight: No need to be accurate everywhere
How about a model that predicts performance for any given configuration? Config-performance model Start with any config Run the config Blackbox modeling Choose next config Return config Insight: All we need is the top ranking (which one is the better). “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Bayesian Optimization

How to pick the next configuration?

Why CherryPick works? “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Does the "blackbox” behave reasonably?

Conclusion “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17

Reminder Signup on Piazza Project proposal idea due in 12 days
Need to post paper summaries there Project proposal idea due in 12 days

CMSC ML for Cluster Scheduling (1)

Similar presentations

Presentation on theme: "CMSC ML for Cluster Scheduling (1)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMSC ML for Cluster Scheduling (1)

Similar presentations

Presentation on theme: "CMSC ML for Cluster Scheduling (1)"— Presentation transcript:

Similar presentations

About project

Feedback