GraphChi: Big Data – small machine

Name: GraphChi: Big Data – small machine
Uploaded: 2017-09-06T02:32:59+00:00
Duration: PTM19S29
Channel: Shanon Skinner
Description: GraphChi: Big Data – small machine

GraphChi: Big Data – small machine
What is it good for – and what’s new? Aapo Kyrölä Ph.D. CMU I am Aapo Kyrola, I am beginning my fifth year as a Ph.D. student in Carnegie Mellon. I am advised by Carlos and Guy Blelloch. In this talk I am going to talk about GraphChi. GraphChi is part of the GraphLab project, kind of a spin-off. The basic promise of GraphChi is that you can conveniently do graph computation on extremely large graphs, on just a Mac Mini or a laptop. And I actually mean very big graphs. This talk is quite high level: more like a marketing talk – I want to encourage you to have a look at GraphChi. But I will also talk some new stuff. I would like to also use this opportunity to thank Pankaj, who just talked, because last year we met in this workshop and he invited me to work for Twitter for the last Fall. That was a great experience and unique opportunity, and based on that work I am proud [transition]

GraphChi can compute on the full Twitter follow-graph with just a standard laptop.
… that I can say that you can use a basic Mac laptop to compute on the actual, whole Twitter graph. This is data that is not normally available in the academic research, and I am really proud of this. It it hard to find results from the literature that would have done experiments on anything this big. And this is just on a laptop. Thanks Pankaj for the opportunity, and a especially appreciate Twitter’s commitment to open source, and I have been able to use and release the code I did while at Twitter without trouble. Ok – I said you can compute on the Twitter graph, but how fast? Roughly speaking, as fast as you can do on a huge Hadoop cluster. Goes without saying that energy wise or cost wise GraphChi is incredibly efficient. Response to Pankaj GraphChi can compute on the actual Twitter graph, on a Macbook Pro (Fall 2012) Cite Pankaj’s paper Roughly same performance as Twitter’s hadoop cluster Think about energy consumption, costs ~ as fast as a very large Hadoop cluster! (size of the graph Fall 2013, > 20B edges [Gupta et al 2013])

2 What is GraphChi Both in OSDI’12!
So as a recap, GraphChi is a disk-based GraphLab. While GraphLab2 is incredibly powerful on big clusters, or in the cloud, you can use GraphChi to solve as big problems on just a Mac Mini. Of course, GraphLab can solve the problems way faster – but I believe GraphChi provides performance that is more then enough for many. Spin-off of GraphLab project Disk based GraphLab OSDI’12 Both in OSDI’12!

Parallel Sliding Windows
Details: Kyrola, Blelloch, Guestrin: “Large-scale graph computation on just a PC” (OSDI 2012) Parallel Sliding Windows or So how does GraphChi work? I don’t have time to go to details now. It is based on an algorithm we invented called Parallel Sliding Windows. In this model you split the graph in to P shards, and the graph is processed in P parts. For each part you load one shard completely in to memory, and load continuous chunks of data from the other shards. All in all, you need very small number of random accesses, which are the bottleneck of disk based computing. GraphChi is good on both SSD and hard drive! Only P large reads for each interval (sub-graph). P2 reads on one full pass.

GraphLab/Pregel style programming + with hacking + more
Why GraphChi Performance Great scalability GraphLab/Pregel style programming + with hacking + more Applications Easy! I am now going into details of the features of GraphChi, which are the selling points fo GraphChi. So I am going to talk about the Performance, how it is – perhaps surprising – a really scalable system, then just to mention that you program GraphChi with the familiar vertex centric model of GraphLab or Pregel but GraphLab provides some extensions like support for dynamic graphs.

Performance Comparison
See the paper for more comparisons. Performance Comparison PageRank WebGraph Belief Propagation (U Kang et al.) On a Mac Mini: GraphChi can solve as big problems as existing large-scale systems. Comparable performance. Matrix Factorization (Alt. Least Sqr.) Triangle Counting Unfortunately the literature is abundant with Pagerank experiments, but not much more. Pagerank is really not that interesting, and quite simple solutions work. Nevertheless, we get some idea. Pegasus is a hadoop-based graph mining system, and it has been used to implement a wide range of different algorithms. The best comparable result we got was for a machine learning algo “belief propagation”. Mac Mini can roughly match a 100node cluster of Pegasus. This also highlights the inefficiency of MapReduce. That said, the Hadoop ecosystem is pretty solid, and people choose it for the simplicity. Matrix factorization has been one of the core Graphlab applications, and here we show that our performance is pretty good compared to GraphLab running on a slightly older 8-core server. Last, triangle counting, which is a heavy-duty social network analysis algorithm. A paper in VLDB couple of years ago introduced a Hadoop algorithm for counting triangles. This comparison is a bit stunning. But, I remind that these are prior to PowerGraph: in OSDI, the map changed totally! However, we are confident in saying, that GraphChi is fast enough for many purposes. And indeed, it can solve as big problems as the other systems have been shown to execute. It is limited by the disk space. Notes: comparison results do not include time to transfer the data to cluster, preprocessing, or the time to load the graph from disk. GraphChi computes asynchronously, while all but GraphLab synchronously.

PowerGraph Comparison
OSDI’12 PowerGraph Comparison 2 PowerGraph / GraphLab 2 outperforms previous systems by a wide margin on natural graphs. With 64 more machines, 512 more CPUs: Pagerank: 40x faster than GraphChi Triangle counting: 30x faster than GraphChi. PowerGraph really resets the speed comparisons. However, the point of ease of use remain, and GraphChi likely provides sufficient performance for most people. But if you need peak performance and have the resources, PowerGraph is the answer. GraphChi has still a role as the development platform for PowerGraph. vs. GraphChi GraphChi has state-of-the-art performance / CPU.

Scalability / Input Size [SSD]
Throughput: number of edges processed / second. Conclusion: the throughput remains roughly constant when graph size is increased Performance  Here, in this plot, the x-axis is the size of the graph as the number of edges. All the experiment graphs are presented here. On the y-axis, we have the performance: how many edges processed by second. Now the dots present different experiments (averaged), and the read line is a least-squares fit. On SSD, the throughput remains very closely constant when the graph size increases. Note that the structure of the graph actually has an effect on performance, but only by a factor of two. The largest graph, yahoo-web, has a challenging structure, and thus its results are comparatively worse. No worries of running out of memory, or buying more machines when your data grows. Graph size 

GraphChi^2 Distributed Graph System
Single-computer system (capable of big tasks) Task 7 Task 6 Task 5 Task 4 Task 3 Task 2 Task 1 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 6 machines T11 T10 T9 T8 T7 T6 T5 T4 T3 T2 T1 (Significantly) less than 2x throughput with 2x machines Task 1 Exactly 2x throughput with 2x machines Task 2 The fact that GraphChi can scale to very big problems makes it surprisingly an interesting candidate for massive production systems. This is true in cases when you can sacrifice latency for throughput, and you have many problems that you run on the same graph, data. For example, Twitter computes recommendations for each user personally – and they are all on same graph. So let’s say you need to compute millions of new recommendations a day – but you don’t need to compute them in a few seconds. Then you could have a choice between a distributed efficient graph system – which needs many machines just to solve this one problem, and of using GraphChi to run one task a time. Note that one task can mean computing recommendations for thousands or even million of users a time – I will describe such a setting later today. This is a made-up example to illustrate a point. Here we have chosen T to be the time the single machine system, such as GraphChi, solves the one task. Let’s assume the cluster system needs 6 machines to solve the problem, and does it about 7 times faster than GraphChi. Then in Time T it solves 7 tasks while GraphChi solves 6 tasks with the same cluster. Now if we double the size of the cluster, to twelve machines: cluster systems never have linear speedup, so let’s assume the performance increases by say 50%. Of course this is just fake numbers, but similar behavior happens at some cut-off point anyway. Now GraphChi will solve exactly twice the number of tasks in time T. Task 3 Task 4 Task 5 Task 6 Task 10 Task 11 12 machines Task 12 Time T Time T

We are not only ones thinking this way…

Applications for GraphChi
Graph Mining Connected components Approx. shortest paths Triangle counting Community Detection SpMV PageRank Generic Recommendations Random walks Collaborative Filtering (by Danny Bickson) ALS SGD Sparse-ALS SVD, SVD++ Item-CF + many more Probabilistic Graphical Models Belief Propagation One important factor to evaluate is that is this system any good? Can you use it for anything? GraphChi is an early project, but we already have a great variety of algorithms implement on it. I think it is safe to say, that the system can be used for many purposes. I don’t know of a better way to evaluate the usability of a system than listing what it has been used for. There are over a thousand of downloads of the source code + checkouts which we cannot track, and we know many people are already using the algorithms of GraphChi and also implementing their own. Most of these algos are now available only in the C++ edition, apart from the random walk system which is only in the Java version.

Programming + Special Features
Similar programming model as GraphLab version 1 Dynamic graphs Streaming graphs while computing Graph contraction algorithms (new) Minimum spanning forest

Easy to Get Started Java and C++ versions available
No installation, just run Any machine, SSD or HD

What’s New

Extensions 1. Dynamic Edge and Vertex Values
Block 1 Block 1 1. Dynamic Edge and Vertex Values Divide shards into small (4 mb) blocks that can be resized separately. Block 2 Block 2 Block 3 Block 3 Block N Shard(j) 2. Integration with Hadoop / Pig 3. Fast neighborhood queries over shards Sparse indices 4. DrunkardMob: Random Walks (next…)

Random Walk Simulations
Personalized PageRank Problem: using the power method would require O(V2) of memory to compute for all vertices. Can be approximated by simulating random walks and computing the sample distribution. Other applications: Recommender systems: FolkRank (Hotho 2006), finding candidates Knowledge-base inference (Lao, Cohen 2009)

Random walk in an in-memory graph
Compute one walk a time (multiple in parallel, of course): parfor walk in walks: for i=1 to numsteps: vertex = walk.atVertex() walk.takeStep(vertex.randomNeighbor()) So how would we do this if we could fit the graph in memory? Extremely slow in GraphChi / PSW ! Each hop might require loading of a new interval.

Random walks in GraphChi
DrunkardMob –algorithm Reverse thinking parfor vertex in graph: mywalks = walkManager.getWalksAtVertex(vertex.id) foreach walk in mywalks: walkManager.addHop(walk, vertex.randomNeighbor()) Need to encode only current vertex and source vertex for each walk: 4-byte integer sufficient / walk With 144 GB RAM, could run 15 billion walks simultaneously (on Java) – recommendations for 15 million users CHUNKS! Load chunk of graph, chunk of walks and move them forward.

Keeping track of walks GraphChi
Walk Distribution Tracker (DrunkardCompanion) Execution interval Source A top-N visits Source B top-N visits Vertex walks table (WalkManager)

Keeping track of walks GraphChi
Walk Distribution Tracker (DrunkardCompanion) Execution interval Source A top-N visits Source B top-N visits Source A top-N visits Source B top-N visits Vertex walks table (WalkManager)

Application: Twitter’s Who-to-Follow
Based on WWW’13 paper by Gupta et. al. Step 3: Compute SALSA and pick top scored users as recommendations. TODO: IMPROVE Step 1: Compute Circle of Trust (CoT) for each user Step 2: Bipartite graph with CoT + CoT’s followees. Neighborhood queries over shards. DrunkardMob

Conclusion GraphChi can run your favorite graph computation on extremely large graphs – on your laptop Unique features such as random walk simulations and dynamic graphs Most popular: Collaborative Filtering toolkit (by Danny Bickson)

Thank you! Aapo Kyrölä Ph.D. candidate @ CMU – soon to graduate!

GraphChi: Big Data – small machine

Similar presentations

Presentation on theme: "GraphChi: Big Data – small machine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GraphChi: Big Data – small machine

Similar presentations

Presentation on theme: "GraphChi: Big Data – small machine"— Presentation transcript:

Similar presentations

About project

Feedback