Naiad: A Timely Dataflow System

Slides:



Advertisements
Similar presentations
Distributed Data-Parallel Computing Using a High-Level Programming Language Yuan Yu Michael Isard Joint work with: Andrew Birrell, Mihai Budiu, Jon Currey,
Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
SkewTune: Mitigating Skew in MapReduce Applications
Naiad Timely Dataflow System Presented by Leeor Peled Advanced Topics in Computer Architecture April 2014.
Naiad Iterative and Incremental Data-Parallel Computation Frank McSherry Rebecca Isaacs Derek G. Murray Michael Isard Microsoft Research Silicon Valley.
Distributed Graph Processing Abhishek Verma CS425.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
An Efficient Process Live Migration Mechanism for Load Balanced Distributed Virtual Environments Balazs Gerofi, Hajime Fujita, Yutaka Ishikawa Yutaka Ishikawa.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Distributed Computations
Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.
A Stratified Approach for Supporting High Throughput Event Processing Applications July 2009 Geetika T. LakshmananYuri G. RabinovichOpher Etzion IBM T.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Pregel: A System for Large-Scale Graph Processing
Dryad and dataflow systems
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Data Structures and Algorithms in Parallel Computing Lecture 4.
© Hortonworks Inc Hadoop: Beyond MapReduce Steve Loughran, Big Data workshop, June 2013.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
TensorFlow– A system for large-scale machine learning
Some slides adapted from those of Yuan Yu and Michael Isard
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
International Conference on Data Engineering (ICDE 2016)
CSCI5570 Large Scale Data Processing Systems
Spark Presentation.
PREGEL Data Management in the Cloud
Solving DEBS Grand Challenge with WSO2 CEP
Speculative Region-based Memory Management for Big Data Systems
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind
Introduction to Spark.
Replication-based Fault-tolerance for Large-scale Graph Processing
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Pregelix: Think Like a Vertex, Scale Like Spandex
Slides prepared by Samkit
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Chi: A Scalable & Programmable Control Plane for Distributed Stream Processing Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman,
Twister2: Design of a Big Data Toolkit
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
TensorFlow: A System for Large-Scale Machine Learning
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
COS 518: Distributed Systems Lecture 11 Mike Freedman
Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,
Presentation transcript:

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martín Abadi Microsoft Research

Batch processing Stream processing Graph processing Timely dataflow

#x In @y ⋈ max ⋈ z? ⋈

< 100ms interactive queries < 1s batch updates #x In @y ⋈ max ⋈ < 1ms iterations z? ⋈ < 100ms interactive queries

Outline Revisiting dataflow How to achieve low latency Evaluation

Dataflow Stage Connector

Dataflow: parallelism Vertex B C Edge

Dataflow: iteration

Batching Streaming vs. (synchronous) (asynchronous) Requires coordination Supports aggregation No coordination needed Aggregation is difficult

Batch iteration  

Streaming iteration   

Supports asynchronous and fine-grained synchronous execution Timely dataflow   – timestamp Supports asynchronous and fine-grained synchronous execution

How to achieve low latency Programming model Distributed progress tracking protocol System performance engineering

Programming model 2× C.Operation(x, y, z) B C D 2× C.OnCallback(u, v)

Messages  Messages are delivered asynchronously B.SendBy(edge, message, time) B C D  C.OnRecv(edge, message, time) Messages are delivered asynchronously

Notifications support batching C.SendBy(_, _, time) D.NotifyAt(time) B C D  No more messages at time or earlier D.OnRecv(_, _, time) D.OnNotify(time) Notifications support batching

Programming frameworks input.SelectMany(x => x.Split()) .Where(x => x.StartsWith("#")) .Count(x => x); GraphLINQ Frameworks LINQ BLOOM AllReduce BSP (Pregel) Differential dataflow Timely dataflow API Distributed runtime

How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol System performance engineering

How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol System performance engineering

Progress tracking Epoch t is complete E.NotifyAt(t) A B C D E C.OnRecv(_, _, t) C.SendBy(_, _, tʹ) tʹ ≥ t

Problem: C depends on its own output Progress tracking C.NotifyAt(t) B C A E D Problem: C depends on its own output

Solution: structured timestamps in loops B.SendBy(_, _, (1, 7)) C.NotifyAt(t) C.NotifyAt((1, 6)) A.SendBy(_, _, 1) E.NotifyAt(1) E.NotifyAt(?) B C A E D F Advances timestamp Advances loop counter D.SendBy(1, 6) Solution: structured timestamps in loops

Graph structure leads to an order on events (1, 6) (1, 6) B C ⊤ 1 A E D F (1, 5) (1, 6)

Graph structure leads to an order on events (1, 6) (1, 6) Maintain the set of outstanding events Sort events by could-result-in (partial) order Deliver notifications in the frontier of the set B C ⊤ 1 A E D F OnNotify(t) is called after all calls to OnRecv(_, _, t) (1, 5) (1, 6)

Optimizations make doing this practical E.NotifyAt(1) C.SendBy(_, _, (1, 5)) Optimizations make doing this practical D.OnRecv(_, _, (1, 5)) E.OnNotify(1)

How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol Enables processes to deliver notifications promptly System performance engineering

How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol Enables processes to deliver notifications promptly System performance engineering

Performance engineering Microstragglers are the primary challenge Garbage collection O(1–10 s) TCP timeouts O(10–100 ms) Data structure contention O(1 ms) For detail on how we handled these, see paper (Sec. 3)

How to achieve low latency Programming model Asynchronous and fine-grained synchronous execution Distributed progress tracking protocol Enables processes to deliver notifications promptly System performance engineering Mitigates the effect of microstragglers

Outline Revisiting dataflow How to achieve low latency Evaluation

System design Limitation: Data S S S S S S S S Control 64  8-core 2.1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet System design Data S S S S S S S S Control Progress tracker Progress tracker Limitation: Fault tolerance via checkpointing/logging (see paper)

Iteration latency 95th percentile: 2.2 ms Median: 750 μs 64  8-core 2.1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet Iteration latency 95th percentile: 2.2 ms Median: 750 μs

Applications Iterative machine learning PageRank Word count Interactive graph analysis GraphLINQ Frameworks LINQ BLOOM AllReduce BSP (Pregel) Differential dataflow Timely dataflow API Distributed runtime

PageRank Twitter graph 42 million nodes 1.5 billion edges 64  8-core 2.1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet PageRank Pregel (Naiad) GraphLINQ GAS (PowerGraph) GAS (Naiad)

Interactive graph analysis #x 32K tweets/s In @y ⋈ max ⋈ 10 queries/s z? ⋈

Query latency Max: 140 ms 99th percentile: 70 ms Median: 5.2 ms 32  8-core 2.1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet Query latency Max: 140 ms 99th percentile: 70 ms Median: 5.2 ms

Conclusions Low-latency distributed computation enables Naiad to: achieve the performance of specialized frameworks provide the flexibility of a generic framework The timely dataflow API enables parallel innovation Now available for download: http://github.com/MicrosoftResearchSVC/naiad/

Now available for download: For more information Visit the project website and blog http://research.microsoft.com/naiad/ http://bigdataatsvc.wordpress.com/ Now available for download: http://github.com/MicrosoftResearchSVC/naiad/

Now available for download: Naiad    Now available for download: http://github.com/MicrosoftResearchSVC/naiad