CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

Apache Storm A scalable distributed & fault tolerant real time computation system ( Free & Open Source ) Shyam Rajendran 16-Feb-15.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
Distributed Graph Processing Abhishek Verma CS425.
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
1 Large-Scale Machine Learning at Twitter Jimmy Lin and Alek Kolcz Twitter, Inc. Presented by: Yishuang Geng and Kexin Liu.
LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Distributed Computations
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 3: Mapreduce and Hadoop All slides © IG.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Distributed Computations MapReduce
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Real-Time Stream Processing CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Pregel: A System for Large-Scale Graph Processing
Lecture 3-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 31, 2010 Lecture 3  2010, I. Gupta.
MapReduce.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
Next Generation of Apache Hadoop MapReduce Owen
Part III BigData Analysis Tools (Storm) Yuan Xue
Adaptive Online Scheduling in Storm Paper by Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni Presentation by Keshav Santhanam.
1 Tree and Graph Processing On Hadoop Ted Malaska.
Supporting On-Demand Elasticity in Distributed Graph Processing Mayank Pundir*, Manoj Kumar, Luke M. Leslie, Indranil Gupta, Roy H. Campbell University.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Lecture 23: Structure of Networks
TensorFlow– A system for large-scale machine learning
Lecture 22: Stream Processing, Graph Processing
Original Slides by Nathan Twitter Shyam Nutanix
Large-scale file systems and Map-Reduce
PREGEL Data Management in the Cloud
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Boyang Peng, Le Xu, Indranil Gupta
Data Structures and Algorithms in Parallel Computing
Lecture 23: Structure of Networks
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
湖南大学-信息科学与工程学院-计算机与科学系
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Lecture 22: Stream Processing, Graph Processing
Lecture 23: Structure of Networks
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Introduction to MapReduce
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
MapReduce: Simplified Data Processing on Large Clusters
Lecture 29: Distributed Systems
Presentation transcript:

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG

Why Stream Processing Storm Stream Processing: What We’ll Cover

Large amounts of data => Need for real-time views of data Social network trends, e.g., Twitter real-time search Website statistics, e.g., Google Analytics Intrusion detection systems, e.g., in most datacenters Process large amounts of data With latencies of few seconds With high throughput Stream Processing Challenge

Batch Processing => Need to wait for entire computation on large dataset to complete Not intended for long-running stream-processing MapReduce?

Apache Project Highly active JVM project Multiple languages supported via API Python, Ruby, etc. Used by over 30 companies including Twitter: For personalization, search Flipboard: For generating custom feeds Weather Channel, WebMD, etc. Enter Storm

Tuples Streams Spouts Bolts Topologies Storm Components

An ordered list of elements E.g., Tuple

Sequence of tuples Potentially unbounded in number of tuples Social network example:,, … Website example:,, … Stream Tuple

A Storm entity (process) that is a source of streams Often reads from a crawler or DB Spout Tuple

A Storm entity (process) that Processes input streams Outputs more streams for other bolts Bolt

A directed graph of spouts and bolts (and output bolts) Corresponds to a Storm “application” Topology

Can have cycles if the application requires it Topology

Operations that can be performed Filter: forward only tuples which satisfy a condition Joins: When receiving two streams A and B, output all pairs (A,B) which satisfy a condition Apply/transform: Modify each tuple according to a function And many others But bolts need to process a lot of data Need to make them fast Bolts come in many Flavors

Have multiple processes (“tasks”) constitute a bolt Incoming streams split among the tasks Typically each incoming tuple goes to one task in the bolt Decided by “Grouping strategy” Three types of grouping are popular Parallelizing Bolts

Shuffle Grouping Streams are distributed randomly to the bolt’s tasks Randomly but consistently – use a hash function! (Remember consistent hashing from P2P systems?) Fields Grouping Group a stream by a subset of its fields E.g., All tweets where twitter username starts with [A-M,a-m,0-4] goes to task 1, and all tweets starting with [N-Z,n-z,5-9] go to task 2 All Grouping All tasks of bolt receive all input tuples Useful for joins Grouping

Master node Runs a daemon called Nimbus Responsible for Distributing code around cluster Assigning tasks to machines Monitoring for failures of machines Worker node Runs on a machine (server) Runs a daemon called Supervisor Listens for work assigned to its machines Zookeeper Coordinates Nimbus and Supervisors communication All state of Supervisor and Nimbus is kept here Storm Cluster

A tuple is considered failed when its topology (graph) of resulting tuples fails to be fully processed within a specified timeout Anchoring: Anchor an output to one or more input tuples Failure of one tuple causes one or more tuples to replayed Failures

Emit(tuple, output) Emits an output tuple, perhaps anchored on an input tuple (first argument) Ack(tuple) Acknowledge that you (bolt) finished processing a tuple Fail(tuple) Immediately fail the spout tuple at the root of tuple topology if there is an exception from the database, etc. Must remember to ack/fail each tuple Each tuple consumes memory. Failure to do so results in memory leaks. API For Fault-Tolerance (OutputCollector)

Processing data in real-time a big requirement today Storm And other sister systems, e.g., Spark Streaming Parallelism Application topologies Fault-tolerance Summary: Stream Processing

Distributed Graph Processing Google’s Pregel system Inspiration for many newer graph processing systems: Piccolo, Giraph, GraphLab, PowerGraph, LFGraph, X-Stream, etc. Graph Processing: What We’ll Cover

Large graphs are all around us Internet Graph: vertices are routers/switches and edges are links World Wide Web: vertices are webpages, and edges are URL links on a webpage pointing to another webpage Called “Directed” graph as edges are uni-directional Social graphs: Facebook, Twitter, LinkedIn Biological graphs: DNA interaction graphs, ecosystem graphs, etc. Lots of Graphs Source: Wikimedia Commons

Need to derive properties from these graphs Need to summarize these graphs into statistics E.g., find shortest paths between pairs of vertices Internet (for routing) LinkedIn (degrees of separation) E.g., do matching Dating graphs in match.com (for better dates) PageRank Web Graphs Google search, Bing search, Yahoo search: all rely on this And many (many) other examples! Graph Processing Operations

Because these graphs are large! Human social network has 100s Millions of vertices and Billions of edges WWW has Millions of vertices and edges Hard to store the entire graph on one server and process it Slow on one server (even if beefy!) Use distributed cluster/cloud! Why Hard?

Works in iterations Each vertex assigned a value In each iteration, each vertex: 1.Gathers values from its immediate neighbors (vertices who join it directly with an edge). B  A, C  A, D  A,… 2.Does some computation using its own value and its neighbors values. 3.Updates its new value and sends it out to its neighboring vertices. E.g., A  B, C, D, E Graph processing terminates after: i) fixed iterations, or ii) vertices stop changing values Typical Graph Processing Application A B C D E

Multi-stage Hadoop Each stage == 1 graph iteration Assign vertex ids as keys in the reduce phase Well-known  At the end of every stage, transfer all vertices over network (to neighbor vertices)  All vertex values written to HDFS (file system)  Very slow! Hadoop/MapReduce to the Rescue?

“Think like a vertex” Originally by Valiant (1990) Bulk Synchronous Parallel Model Source:

“Think like a vertex” Assign each vertex to one server Each server thus gets a subset of vertices In each iteration, each server performs Gather-Apply-Scatter for all its assigned vertices Gather: get all neighboring vertices’ values Apply: compute own new value from own old value and gathered neighbors’ values Scatter: send own new value to neighboring vertices Basic Distributed Graph Processing

How to decide which server a given vertex is assigned to? Different options Hash-based: Hash(vertex id) modulo number of servers Remember consistent hashing from P2P systems?! Locality-based: Assign vertices with more neighbors to the same server as its neighbors Reduces server to server communication volume after each iteration Need to be careful: some “intelligent” locality-based schemes may take up a lot of upfront time and may not give sufficient benefits! Assigning Vertices

Pregel uses the master/worker model Master (one server) Maintains list of worker servers Monitors workers; restarts them on failure Provides Web-UI monitoring tool of job progress Worker (rest of the servers) Processes its vertices Communicates with the other workers Persistent data is stored as files on a distributed storage system (such as GFS or BigTable) Temporary data is stored on local disk Pregel System By Google

1.Many copies of the program begin executing on a cluster 2.The master assigns a partition of input (vertices) to each worker Each worker loads the vertices and marks them as active 3.The master instructs each worker to perform a iteration Each worker loops through its active vertices & computes for each vertex Messages can be sent whenever, but need to be delivered before the end of the iteration (i.e., the barrier) When all workers reach iteration barrier, master starts next iteration 4.Computation halts when, in some iteration: no vertices are active and when no messages are in transit 5.Master instructs each worker to save its portion of the graph Pregel Execution

Checkpointing Periodically, master instructs the workers to save state of their partitions to persistent storage e.g., Vertex values, edge values, incoming messages Failure detection Using periodic “ping” messages from master  worker Recovery The master reassigns graph partitions to the currently available workers The workers all reload their partition state from most recent available checkpoint Fault-Tolerance in Pregel

Shortest paths from one vertex to all vertices SSSP: “Single Source Shortest Path” On 1 Billion vertex graph (tree) 50 workers: 180 seconds 800 workers: 20 seconds 50 B vertices on 800 workers: 700 seconds (~12 minutes) Pretty Fast! How Fast Is It?

Lots of (large) graphs around us Need to process these MapReduce not a good match Distributed Graph Processing systems: Pregel by Google Many follow-up systems Piccolo, Giraph: Pregel-like GraphLab, PowerGraph, LFGraph, X-Stream: more advanced Summary: Graph Processing