Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research.

Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research

Motivation Tremendous increase in graph data and applications – New class of graph applications that require real-time responses – Even batch-processed workloads have strict time-constraints Multi-core revolution – Default standards on most machines – Large-scale multi-cores with terabytes of main memory – Run workloads that are traditionally run on distributed systems Existing graph-processing systems lack support for both

A High-level Description of Grace Grace is an in-memory graph management and processing system Implements several optimizations – Graph-specific – Multi-core-specific Supports snapshots and transactional updates on graphs Evaluation shows that optimizations help Grace run several times faster than other alternatives Overview Details of optimizations Details on transactions Subset of results Outline

Keeps an entire graph in memory in smaller parts. Exposes C-style API for writing graph workloads, iterative workloads, and updates. Design driven by two trends -Graph-specific locality -Partitionable and parallelizable workloads v = GetVertex(Id) for (i=0; i<v.degree;i++) neigh=v.GetNeighbor(i) Grace API Core 0Core 1 A B D CE Iterative Programs (e.g., PageRank) RPC Net Graph and Multi-core Optimizations An Overview of Grace

C C Data Structures ABDC Edge Pointer Array A0B1C2 Vertex Index 111 0 Vertex Allocation Map ABC BCBC Edges of AEdges of BEdges of C Vertex Log Edge Log Data Structures in a Partition

Graph-Aware Partitioning & Placement Partitioning and placement – are they useful on a single machine? – Yes, to take advantage of multi-cores and memory hierarchies Solve them using graph partitioning algorithms – Divide a graph into sub-graphs, minimizing edge-cuts Grace provides an extensible library – Graph-aware: heuristic-based, spectral partitioning, Metis – Graph-agnostic: hash partitioning Achieve better layout by recursive graph partitioning – Recursively run graph partition until a sub-graph can fit in a cache line – Recompose all the sub-graphs to get the vertex layout

Platform for Parallel Iterative Computations Iterative computation platform implements “bulk synchronous parallel” model. Barrier Parallel computations Propagate updates Iteration 1 Iteration 2

Load Balancing and Updates Batching Solution1: Load balancing is implemented by sharing a portion of vertices Barrier B C D A Part0 Core0 Part1 Core1 Part2 Core2 Cache line Problem1: overloaded partitions can affect performance Problem2: Updates in arbitrary order can increase cache misses Solution2: Updates batching is implemented by -grouping updates by their destination part -Issuing updates in a round-robin fashion

Grace supports structural changes to a graph BeginTransaction() AddVertex(X) AddEdge(X, Y) EndTransaction() Transactions use snapshot isolation -Instantaneous snapshots using CoW techniques -CoW can affect careful memory layout! Transactions on Graphs

Graphs: -Web (v:88M, e:275M), sparse -Orkut (v:3M, e:223M), dense Workloads: -N-hop-neighbor queries, BFS, DFS, PageRank, Weakly- Connected Components, Shortest Path Architecture: -Intel Xeon-12 cores, 2 chips with 6 cores each -AMD Opteron-48 cores, 4 chips with 12 cores each Questions: -How well partitioning and placement work? -How useful are load balancing and updates batching? -How does Grace compare to other systems? Evaluation

Partitioning and Placement Performance On Intel Observation: For smaller number of partitions, partition algorithm didn’t make a big difference Reason: All the partitions fit within cores of single chip minimizing communication cost PageRank Speedup Orkut graph partitionsWeb graph partitions Observation: Placing neighboring vertices close together improves performance significantly Reason: L1, L2, and L3 cache and Data-TLB misses are reduced Observation: Careful vertex arrangement works better when graph partitioning is used for sparse graphs Reason: graph partitioning puts neighbors under same part helping better placement 1 2 3

Load Balancing and Updates Batching On Intel PageRank Speedup Orkut graph partitions Web graph partitions Observation: Load balancing and updates batching didn’t improve performance for web graph Reason: Sparse graphs can be partitioned better and there are fewer updates to send Observation: Batching updates gives better performance improvement for Orkut graph Reason: Updates batching reduces remote cache accesses 1 2 Retired Load

Comparing Grace, BDB, and Neo4j Running Time (s)

Conclusion Grace explores graph-specific and multi-core specific optimizations What worked and what didn’t (in our setup; your mileage might differ) – Careful vertex placement in memory gave good improvements – Partitioning and updates batching worked in most cases, but not always – Load balancing wasn’t as useful

Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research.

Similar presentations

Presentation on theme: "Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research.

Similar presentations

Presentation on theme: "Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research."— Presentation transcript:

Similar presentations

About project

Feedback