Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.

Slides:



Advertisements
Similar presentations
Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Distributed Data-Parallel Computing Using a High-Level Programming Language Yuan Yu Michael Isard Joint work with: Andrew Birrell, Mihai Budiu, Jon Currey,
epiC: an Extensible and Scalable System for Processing Big Data
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Distributed Computations
Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.
Parallel Programming Models and Paradigms
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Dryad and dataflow systems
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Cloud Computing Other High-level parallel processing languages Keke Chen.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce How to painlessly process terabytes of data.
Restore : Reusing results of mapreduce jobs Jun Fan.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
Distributed Computations MapReduce/Dryad M/R slides adapted from those of Jeff Dean’s Dryad slides adapted from those of Michael Isard.
Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
BIG DATA/ Hadoop Interview Questions.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
TensorFlow– A system for large-scale machine learning
Some slides adapted from those of Yuan Yu and Michael Isard
Seth Pugsley, Jeffrey Jestes,
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
CSCI5570 Large Scale Data Processing Systems
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Parallel Algorithm Design
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
CS110: Discussion about Spark
DryadInc: Reusing work in large-scale computations
Fast, Interactive, Language-Integrated Cluster Computing
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013

Distributed Data-Parallel Computing Distributed execution plan generated by query compiler (DryadLINQ) Automatic distributed execution (Dryad)

Execution Plan Graph (EPG) EPG: distributed execution plan represented as a DAG: -Representing computation and dataflow of data-parallel program Core data structure in distributed execution engines -Task distribution -Job management -Fault tolerance Map Distribute Merge GroupBy Reduce EPG of MapReduce

Outline Motivational problems Optimus system Graph rewriters Experimental evaluation Summary & conclusion

Problem 1: Data Partitioning Basic operation to achieve data parallelism Example: MapReduce -Number of partitions = number of reducers More reducers: better load balancing but more overheads in scheduling and disk I/O -Data skew: e.g., popular keys Require statistics of Mapper outputs -Hard to estimate at compile time -But available at runtime We need dynamic data partitioning.

Problem 2: Matrix Computation Widely used in large-scale data analysis Data model: sparse or dense matrix? -Compile-time: unknown density of intermediate matrices How to dynamically choose data model and alternative algorithms ? Alternative algorithms for a given matrix computation -Chosen based on runtime data statistics of input matrices

Problem 3: Iterative Computation Required by machine learning and data analysis Problem: stop condition unknown at compile time -Each job performs N iterative steps -Submit multiple jobs and check convergence at client How to enable iterative computation in one single job ? -Simplifies job monitoring and fault- tolerance -Reduces job submission overhead Job 1 Job 2

Problem 4: Fault Tolerance Intermediate results can be re-generated by re-executing vertices Important intermediate results: expensive to regenerate when lost -Compute-intensive vertices -Critical chain: a long chain of vertices reside in same machine due to data locality How to identify and protect important intermediate results at runtime? X A B C

Problem 5: EPG Optimization Compile-time query optimization -Using data statistics available at compile time -EPG typically unchanged during execution Problems with compile-time optimization: -Data statistics of intermediate stages hard to estimate Complicated by user-defined functions How to optimize EPG at runtime?

Optimus: Dynamic Graph Rewriting Dynamically rewrite EPG based on: -Data statistics collected at runtime -Compute resources available at runtime Goal: extensible -Implement rewriters at language layer Without modifying execution engine (e.g., Dryad) -Allows users to specify rewrite logic

Example: MapReduce Statistics collection at data plane Rewrite message sent to graph rewriter at control plane Merge small partitions Split popular keys

Outline Motivational problems Optimus system Graph rewriters Experimental evaluation Summary & conclusion

Optimus System Architecture Build on DryadLINQ and Dryad Modules -Statistics collecting -Rewrite messaging Data plane  control plane -Graph rewriting Extensible -Statistics and rewrite logic at language/user layers -Rewriting operation at execution layer Client computer User Program User-defined Statistics User-defined Rewrite Logic Messaging Worker Vertex Code Dryad Worker Vertex Worker Vertex Harness Cluster ….. Dryad Job Manager (JM) Core Execution Engine Rewriter Module Statistics Rewrite Logic EPG Worker Vertex Code Statistics Rewrite Logic DryadLINQ Compiler with Optimus Extensions

Estimate/Collect Data Statistics Low overhead: piggy-back into existing vertices -Pipelining “H” into “M” Extensible -Statistics estimator/collector defined at language layer or user-level All at data plane: avoid overwhelming control plane -“H”: distributed statistics estimation/collection -“MG” and “GH”: merge statistics into rewriting message

Graph Rewriting Module A set of primitives to query and modify EPG Rewriting operation depends on vertex state: -INACTIVE: all rewriting primitives applicable -RUNNING: killed and transited to INACTIVE, discarding partial results -COMPLETED: redirect vertex I/O

Outline Motivational problems Optimus system Graph rewriters Experimental evaluation Summary & conclusion

Dynamic Data (Co- )Partitioning Co-partitioning: -Use a common parameter set to partition multiple data sets -Used by multi-source operators, e.g., Join Co-range partition in Optimus:

Hybrid Join I H I H I H I H I H GH K DD MG DD D J J JJ D1 JJ Co-partition to prepare data for partition-wise Join Skew detected at runtime Re-partition skewed partition -Local broadcast join

Iterative Computation Optimus: enables iterative computation in a single job -“C”: check stop condition -Construct another loop if needed

Matrix Multiplication

Matrix Computation Systems dedicated to matrix computations: MadLINQ Optimus: extensibility allows integrating matrix computation with general-purpose DryadLINQ computations Runtime decisions -Data partitioning: subdivide matrices -Data model: sparse or dense -Implementation: a matrix operation often has many algorithmic implementations

Reliability Enhancer for Fault Tolerance Replication graph to protect important data generated by “A”: “C” vertex: copy output of “A” to another computer “O” vertex: allow “B” choose one of two inputs to “O”

Outline Motivational problems Optimus system Graph rewriters Experimental evaluation Summary & conclusion

Evaluation: Product-Offer Matching by Join Input: 5M products + 4M offers -Matching function: compute intensive Algorithms: -Partition-wise GroupJoin -Broadcast-Join -CoGroup: specialized solution -Optimus BaselineCoGroupBroadcastOptimus Aggregated CPU utilization Job completion time Cluster (machine) utilization

Evaluation: Matrix Multiplication Job completion time in seconds 46800

Related Work Dryad: system-level rewriting without semantics of code and data Database: dynamic graph rewriting in a single server environment -Eddies: fine-grain (record-level) optimization -Eddies + Optimus: combine record-level and vertex-level optimization CIEL: programming/execution model different from DryadLINQ/Dryad -Dynamically expands EPG by scripts running at each worker -Hard to achieve some dynamic optimizations: Replacing a running task with a subgraph Reliability enhancer. -Ciel can incorporate Optimus-like components to support dynamic optimizations. RoPE: uses statistics of previously-executed queries to optimize new jobs using same queries

Summary & Conclusion A flexible/extensible framework to modify EPG at runtime Enable runtime optimizations and specializations hard to achieve in other systems A rich set of graph rewriters -Substantial performance benefit compared to statically generated plan A versatile addition to a data-parallel execution framework

Thanks!