Carnegie Mellon University GraphLab Tutorial Yucheng Low.

Slides:



Advertisements
Similar presentations
Danny Bickson Parallel Machine Learning for Large-Scale Graphs
Advertisements

Overview of this week Debugging tips for ML algorithms
Lecture 19: Parallel Algorithms
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.
Exact Inference in Bayes Nets
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.
GraphChi: Big Data – small machine
Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.
Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New Parallel Framework.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe Hellerstein.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.
Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron Alex Smola The Next.
Carnegie Mellon Focused Belief Propagation for Query-Specific Inference Anton Chechetka Carlos Guestrin 14 May 2010.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Kanat Tangwon- gsan Carlos Guestrin Guy Blelloch Joe Hellerstein.
Libperf libperf provides a tracing interface into the Linux Kernel Performance Counters (LKPC) subsystem recently introduced into the Linux Kernel mainline.
GraphLab A New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Distributed Systems CS Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Big Learning with Graph Computation Joseph Gonzalez Download the talk:
Joseph Gonzalez Postdoc, UC Berkeley AMPLab A System for Distributed Graph-Parallel Machine Learning Yucheng Low Aapo Kyrola.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The.
BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute.
GraphLab A New Framework for Parallel Machine Learning
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.
Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos Guestrin.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Data Structures and Algorithms in Parallel Computing
Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.
Carnegie Mellon University Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Joe Hellerstein Alex Smola The Next Generation.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;
A Distributed Framework for Machine Learning and Data Mining in the Cloud BY YUCHENG LOW, JOSEPH GONZALEZ, AAPO KYROLA, DANNY BICKSON, CARLOS GUESTRIN.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Big Data: Graph Processing
Majd F. Sakr, Suhail Rehman and
CSCI5570 Large Scale Data Processing Systems
A New Parallel Framework for Machine Learning
PREGEL Data Management in the Cloud
Distributed Graph-Parallel Computation on Natural Graphs
CSCI5570 Large Scale Data Processing Systems
Predictive Performance
Markov Networks.
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
Distributed Systems CS
HPML Conference, Lyon, Sept 2018
GANG: Detecting Fraudulent Users in OSNs
Markov Networks.
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Carnegie Mellon University GraphLab Tutorial Yucheng Low

GraphLab Team Yucheng Low Aapo Kyrola Jay Gu Joseph Gonzalez Danny Bickson Carlos Guestrin

GraphLab 0.5 (2010) Internal Experimental Code Insanely Templatized Development History GraphLab 1 (2011) Nearly Everything is Templatized First Open Source Release (< June 2011 LGPL >= June 2011 APL) GraphLab 2 (2012) Many Things are Templatized Shared Memory : Jan 2012 Distributed : May 2012

Graphlab 2 Technical Design Goals Improved useability Decreased compile time As good or better performance than GraphLab 1 Improved distributed scalability … other abstraction changes … (come to the talk!)

Development History Ever since GraphLab 1.0, all active development are open source (APL): code.google.com/p/graphlabapi/ (Even current experimental code. Activated with a --experimental flag on./configure )

Guaranteed Target Platforms Any x86 Linux system with gcc >= 4.2 Any x86 Mac system with gcc ( OS X 10.5 ?? ) Other platforms? … We welcome contributors.

Tutorial Outline GraphLab in a few slides + PageRank Checking out GraphLab v2 Implementing PageRank in GraphLab v2 Overview of different GraphLab schedulers Preview of Distributed GraphLab v2 (may not work in your checkout!) Ongoing work… (however much as time allows)

Warning A preview of code still in intensive development! Things may or may not work for you! Interface may still change! GraphLab 1  GraphLab 2 still has a number of performance regressions we are ironing out.

PageRank Example Iterate: Where: α is the random reset probability L[j] is the number of links on page j

The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 10

Data Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge Vertex Data: Webpage Webpage Features Edge Data: Link weight Graph: Link graph 11

The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 12

pagerank(i, scope){ // Get Neighborhood data (R[i], W ij, R[j])  scope; // Update the vertex data // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } Update Functions 13 An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

Dynamic Schedule e e f f g g k k j j i i h h d d c c b b a a CPU 1 CPU 2 a a h h a a b b b b i i 14 Process repeats until scheduler is empty

Source Code Interjection 1 Graph, update functions, and schedulers

--scope=vertex--scope=edge

Consistency Trade-off Consistency “Throughput” # “iterations” per second Goal of ML algorithm: Converge

Ensuring Race-Free Code How much can computation overlap? 18

The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 19

Importance of Consistency Fast ML Algorithm development cycle: Build Test Debug Tweak Model Necessary for framework to behave predictably and consistently and avoid problems caused by non-determinism. Is the execution wrong? Or is the model wrong? 20

Full Consistency Guaranteed safety for all update functions

Full Consistency Parallel update only allowed two vertices apart  Reduced opportunities for parallelism Parallel update only allowed two vertices apart  Reduced opportunities for parallelism

Obtaining More Parallelism Not all update functions will modify the entire scope! Belief Propagation: Only uses edge data Gibbs Sampling: Only needs to read adjacent vertices

Edge Consistency

Obtaining More Parallelism “Map” operations. Feature extraction on vertex data

Vertex Consistency

The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 27

Shared Variables Global aggregation through Sync Operation A global parallel reduction over the graph data Synced variables recomputed at defined intervals while update functions are running Sync: Highest PageRank Sync: Highest PageRank Sync: Loglikelihood Sync: Loglikelihood 28

Source Code Interjection 2 Shared variables

What can we do with these primitives? …many many things…

Matrix Factorization Netflix Collaborative Filtering Alternating Least Squares Matrix Factorization Model: 0.5 million nodes, 99 million edges Netflix Users Movies d

Netflix Speedup Increasing size of the matrix factorization

Video Co-Segmentation Discover “coherent” segment types across a video (extends Batra et al. ‘10) 1. Form super-voxels video 2. EM & inference in Markov random field Large model: 23 million nodes, 390 million edges GraphLab Ideal

Many More Tensor Factorization Bayesian Matrix Factorization Graphical Model Inference/Learning Linear SVM EM clustering Linear Solvers using GaBP SVD Etc.

Distributed Preview

GraphLab 2 Abstraction Changes (an overview couple of them) (Come to the talk for the rest!)

Exploiting Update Functors (for the greater good)

Exploiting Update Functors (for the greater good) 1. Update Functors store state 2. Scheduler schedules update functor instances. 3. We can use update functors as a controlled asynchronous message passing to communicate between vertices!

Delta Based Update Functors struct pagerank : public iupdate_functor { double delta; pagerank(double d) : delta(d) { } void operator+=(pagerank& other) { delta += other.delta; } void operator()(icontext_type& context) { vertex_data& vdata = context.vertex_data(); vdata.rank += delta; if(abs(delta) > EPSILON) { double out_delta = delta * (1 – RESET_PROB) * 1/context.num_out_edges(edge.source()); context.schedule_out_neighbors(pagerank(out_delta)); } }; // Initial Rank: R[i] = 0; // Initial Schedule: pagerank(RESET_PROB);

Asynchronous Message Passing Obviously not all computation can be written this way. But when it can; it can be extremely fast.

Factorized Updates

PageRank in GraphLab struct pagerank : public iupdate_functor { void operator()(icontext_type& context) { vertex_data& vdata = context.vertex_data(); double sum = 0; foreach ( edge_type edge, context.in_edges() ) sum += context.const_edge_data(edge).weight * context.const_vertex_data(edge.source()).rank; double old_rank = vdata.rank; vdata.rank = RESET_PROB + (1-RESET_PROB) * sum; double residual = abs(vdata.rank – old_rank) / context.num_out_edges(); if (residual > EPSILON) context.reschedule_out_neighbors(pagerank()); } };

PageRank in GraphLab struct pagerank : public iupdate_functor { void operator()(icontext_type& context) { vertex_data& vdata = context.vertex_data(); double sum = 0; foreach ( edge_type edge, context.in_edges() ) sum += context.const_edge_data(edge).weight * context.const_vertex_data(edge.source()).rank; double old_rank = vdata.rank; vdata.rank = RESET_PROB + (1-RESET_PROB) * sum; double residual = abs(vdata.rank – old_rank) / context.num_out_edges(); if (residual > EPSILON) context.reschedule_out_neighbors(pagerank()); } }; Atomic Single Vertex Apply Parallel Scatter [Reschedule] Parallel “Sum” Gather

Decomposable Update Functors Decompose update functions into 3 phases: + + … +  Δ Y Y Y Parallel Sum User Defined: Gather( )  Δ Y Δ 1 + Δ 2  Δ 3 Y Scope Gather Y Y Apply(, Δ)  Y Apply the accumulated value to center vertex User Defined: Apply Y Scatter( )  Update adjacent edges and vertices. User Defined: Y Scatter

Factorized PageRank struct pagerank : public iupdate_functor { double accum = 0, residual = 0; void gather(icontext_type& context, const edge_type& edge) { accum += context.const_edge_data(edge).weight * context.const_vertex_data(edge.source()).rank; } void merge(const pagerank& other) { accum += other.accum; } void apply(icontext_type& context) { vertex_data& vdata = context.vertex_data(); double old_value = vdata.rank; vdata.rank = RESET_PROB + (1 - RESET_PROB) * accum; residual = fabs(vdata.rank – old_value) / context.num_out_edges(); } void scatter(icontext_type& context, const edge_type& edge) { if (residual > EPSILON) context.schedule(edge.target(), pagerank()); } };

Demo of *everything* PageRank

Ongoing Work Extensions to improve performance on large graphs. (See the GraphLab talk later!!) Better distributed Graph representation methods Possibly better Graph Partitioning Off-core Graph storage Continually changing graphs All New rewrite of distributed GraphLab (come back in May!)

Ongoing Work Extensions to improve performance on large graphs. (See the GraphLab talk later!!) Better distributed Graph representation methods Possibly better Graph Partitioning Off-core Graph storage Continually changing graphs All New rewrite of distributed GraphLab (come back in May!)