X-Stream: Edge-Centric Graph Processing using Streaming Partitions

Slides:



Advertisements
Similar presentations
GraphChi: Large-Scale Graph Computation on Just a PC
Advertisements

Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:
Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Graph Processing Abhishek Verma CS425.
GraphChi: Big Data – small machine
LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.
Numerical geometry of non-rigid shapes
CS Lecture 9 Storeing and Querying Large Web Graphs.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
GraphLab A New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury.
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
1 A Novel Page-Based Data Structure for Interactive Walkthroughs Behzad Sajadi Yan Huang Pablo Diaz-Gutierrez Sung-Eui Yoon M. Gopi.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
CS246 Search Engine Scale. Junghoo "John" Cho (UCLA Computer Science) 2 High-Level Architecture  Major modules for a search engine? 1. Crawler  Page.
Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys ’13 Hong Kong) Aapo Kyrölä Ph.D. CMU
A Study in NoSQL & Distributed Database Systems John Hawkins.
BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute.
How Computers Work. A computer is a machine f or the storage and processing of information. Computers consist of hardware (what you can touch) and software.
Pregel: A System for Large-Scale Graph Processing
Tree-Based Density Clustering using Graphics Processors
Wook-Shin Han, Sangyeon Lee POSTECH, DGIST
Chronos: A Graph Engine for Temporal Graph Analysis
Network Aware Resource Allocation in Distributed Clouds.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Ligra: A Lightweight Graph Processing Framework for Shared Memory
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Big Graph Processing on Cloud Jeffrey Xu Yu ( 于旭 ) The Chinese University of Hong Kong
Path Computation in External Memory In this work we focus on undirected, unweighted graphs with small diameter. This fits well for real world graph data.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Is Your Graph Algorithm Eligible for Nondeterministic Execution? Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang and Hai Jin Services Computing Technology and.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.
Data Structures and Algorithms in Parallel Computing
Routing Topology Algorithms Mustafa Ozdal 1. Introduction How to connect nets with multiple terminals? Net topologies needed before point-to-point routing.
Graph-Based Parallel Computing
Computing Approximate Weighted Matchings in Parallel Fredrik Manne, University of Bergen with Rob Bisseling, Utrecht University Alicia Permell, Michigan.
Acknowledgement: Arijit Khan, Sameh Elnikety. Google: > 1 trillion indexed pages Web GraphSocial Network Facebook: > 1.5 billion active users 31 billion.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;
Supporting On-Demand Elasticity in Distributed Graph Processing Mayank Pundir*, Manoj Kumar, Luke M. Leslie, Indranil Gupta, Roy H. Campbell University.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Cohesive Subgraph Computation over Large Graphs
CSCI5570 Large Scale Data Processing Systems
Chilimbi, et al. (2014) Microsoft Research
FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Scribed by Vinh Ha.
Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind
Data Structures and Algorithms in Parallel Computing
Mayank Bhatt, Jayasi Mehar
Advanced Topics in Data Management
Replication-based Fault-tolerance for Large-scale Graph Processing
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Pregelix: Think Like a Vertex, Scale Like Spandex
Parallel Applications And Tools For Cloud Computing Environments
CS246 Search Engine Scale.
CS246: Search-Engine Scale
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Presentation transcript:

X-Stream: Edge-Centric Graph Processing using Streaming Partitions Amitabha Roy Ivo Mihailovic Willy Zwaenepoel

Graphs Interesting information is encoded as graphs HyperANF Pagerank ALS …. +

Big Graphs Large graphs are a subset of the big data problem Billions of vertices and edges, hundreds of gigabytes Normally tackled on large clusters Pregel, Giraph, Graphlab … Complexity, Power consumption … Can we do large graphs on a single machine ?

X-Stream 1U server = 64 GB RAM + 2 x 200 GB SSD + 3 x 3TB drive Process large graphs on a single machine 1U server = 64 GB RAM + 2 x 200 GB SSD + 3 x 3TB drive Throw away why build X-Stream.

Approach Problem: Graph traversal = random access Random access is inefficient for storage Disk (500X slower) SSD (20X slower) RAM (2X slower) Solution: X-Stream makes graph accesses sequential Move first point to previous slide. Consider uncovering points. Add backup slide on bandwidth numbers.

Contributions Edge-centric scatter gather model Streaming partitions Too much stuff. Headings are too general.

Standard Scatter Gather Edge-centric scatter gather based on Standard Scatter gather Popular graph processing model Pregel [Google, SIGMOD 2010] … Powergraph [OSDI 2012] Need a picture for scatter gather.

Standard Scatter Gather State stored in vertices Vertex operations Scatter updates along outgoing edges Gather updates from incoming edges Need a picture for scatter gather. V V Scatter Gather

Standard Scatter Gather 1 6 3 7 2 5 8 4 Green should not be sharp. BFS

Vertex-Centric Scatter Gather Iterates over vertices for each vertex v if v has update for each edge e from v scatter update along e Scatter Standard scatter gather is vertex-centric Does not work well with storage

Vertex-Centric Scatter Gather SOURCE DEST 1 3 5 2 7 4 8 6 Lookup Index 1 6 3 V 1 2 3 4 5 6 7 8 7 2 5 8 4 BFS

Transformation for each vertex v if v has update for each edge e from v scatter update along e for each edge e If e.src has update scatter update along e Scatter Scatter Vertex-Centric Edge-Centric

Edge-Centric Scatter Gather SOURCE DEST 1 3 5 2 7 4 8 6 1 6 3 V 1 2 3 4 5 6 7 8 7 2 5 8 4 BFS

= No index No clustering No sorting 1 3 5 2 7 4 8 6 1 3 8 6 5 2 4 7 SOURCE DEST 1 3 5 2 7 4 8 6 SOURCE DEST 1 3 8 6 5 2 4 7 = No index No clustering No sorting

Tradeoff Few scatter gather iterations for real world graphs Vertex-centric Scatter-Gather: 𝑬𝒅𝒈𝒆 𝑫𝒂𝒕𝒂 𝑹𝒂𝒏𝒅𝒐𝒎 𝑨𝒄𝒄𝒆𝒔𝒔 𝑩𝒂𝒏𝒅𝒘𝒊𝒅𝒕𝒉 Edge-centric Scatter-Gather: 𝑺𝒄𝒂𝒕𝒕𝒆𝒓𝒔×𝑬𝒅𝒈𝒆 𝑫𝒂𝒕𝒂 𝑺𝒆𝒒𝒖𝒆𝒏𝒕𝒊𝒂𝒍 𝑨𝒄𝒄𝒆𝒔𝒔 𝑩𝒂𝒏𝒅𝒘𝒊𝒅𝒕𝒉 Sequential Access Bandwidth >> Random Access Bandwidth Few scatter gather iterations for real world graphs Well connected, variety of datasets covered in the paper

Contributions Edge-centric scatter gather model Streaming partitions Too much stuff. Headings are too general.

Streaming Partitions Problem: still have random access to vertex set 1 2 3 4 5 6 7 8 Solution: partition the graph into streaming partitions

Streaming Partitions A streaming partition is A subset of the vertices that fits in RAM All edges whose source vertex is in that subset No requirement on quality of the partition

Partitioning the Graph SOURCE DEST 1 5 4 7 2 3 8 Subset of vertices V1 1 2 3 4 V2 5 6 7 8 SOURCE DEST 5 6 8 1

Random Accesses for Free SOURCE DEST 1 5 4 7 2 3 8 V1 1 2 3 4

Generalization Fast storage Slow storage SOURCE DEST 1 5 4 7 2 3 8 V1 1 2 3 4 Fast storage Slow storage Applies to any two level memory hierarchy

Generally Applicable RAM RAM CPU Cache OR OR Disk SSD RAM

Parallelism Simple Parallelism State is stored in vertex Streaming partitions have disjoint vertices Can process streaming partitions in parallel

Gathering Updates Partition 1 Vertices Edges Partition 100 X Y X Shuffler Minimize random access for large number of partitions Multi-round copying akin to merge sort but cheaper

Performance Focus on SSD results in this talk Similar results with in-memory graphs Switch to on disk for talk.

Baseline Graphchi [OSDI 2012] First to show that graph processing on a single machine Is viable Is competitive Also targets larger sequential bandwidth of SSD and Disk Consider putting in a table of differences

Different Approaches Fundamentally different approaches to same goal Graphchi uses “shards” Partitions edges into sorted shards X-Stream uses sequential scans Partitions edges into unsorted streaming partitions Consider putting in a table of differences

Baseline to Graphchi Graphchi X-Stream Replicated OSDI 2012 experiments on our SSD Run Algorithm Create shards Graphchi Input Shards Answer Consider putting in a table of differences X-Stream Run Algorithm Input Answer

X-Stream Speedup over Graphchi Mean Speedup = 2.3

Baseline to Graphchi Graphchi X-Stream Replicated OSDI 2012 experiments on our SSD Run Algorithm Create shards Graphchi Input Shards Answer Consider putting in a table of differences X-Stream Run Algorithm Input Answer

X-Stream Speedup over Graphchi ( + sharding) Mean Speedup Prev = 2.3 Now = 3.7

Preprocessing Impact X-Stream returns answers before Graphchi finishes sharding Put fourth benchmark back in.

Sequential Access Bandwidth Graphchi shard All vertices and edges must fit in memory X-Stream partition Only vertices must fit in memory More Graphchi shards than X-Stream partitions Makes access more random for Graphchi

SSD Read Bandwidth (Pagerank on Twitter)

SSD Write Bandwidth (Pagerank on Twitter)

Disk Transfers (Pagerank on Twitter) Metric X-Stream Graphchi Data moved 224 GB 322 GB Time taken 398 seconds 2613 seconds Transfer rate 578 MB/s 126 MB/s Add a line of explanation. SSD can sustain reads = 667 MB/s, writes = 576 MB/s X-Stream uses all available bandwidth from the storage device

Scaling up 256 Million V, 4 Billion E, 33 mins 4 Billion V, 64 Billion E, 26 hours 8 Million V, 128 Million E, 8 sec 6 TB Disk 400 GB SSD 16 GB RAM

Edge-centric processing Conclusion X-Stream Edge-centric processing + Streaming Partitions = Sequential Access Too many words. Good Performance RAM, SSD, Disk Big graphs Download from http://labos.epfl.ch/xstream

BACKUP

API Restrictions Updates must be commutative Cannot access all edges from a vertex in single step Remove.

Applications X-Stream can solve a variety of problems BFS, SSSP, Weakly connected components, Strongly connected components, Maximal independent sets, Minimum cost spanning trees, Belief propagation, Alternating least squares, Pagerank, Betweenness centrality, Triangle counting, Approximate neighborhood function, Conductance, K-Cores Q. Average distance between people on a social network ? A. Use approximate neighborhood function. Remove.

Edge-centric Scatter Gather Real world graphs have low diameter 8 1 6 1 3 D=7, BFS in 7 steps 7 2 7 2 5 8 4 3 4 5 6 D=3, BFS in 3 steps, Most real-world graphs

X-Stream Main Memory Performance

Runtime impact of Graphchi Sharding

Pre-processing Overhead Low overhead for producing streaming partition Strictly cheaper than sorting edges by source vertex