Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph reordering/partitioning with redundancy. Motivation 1. distributed graph processing – Use redundancy to reduce the costly communication – Reordering.

Similar presentations


Presentation on theme: "Graph reordering/partitioning with redundancy. Motivation 1. distributed graph processing – Use redundancy to reduce the costly communication – Reordering."— Presentation transcript:

1 Graph reordering/partitioning with redundancy

2 Motivation 1. distributed graph processing – Use redundancy to reduce the costly communication – Reordering vertices such that vertex with the continuous orders are grouped in the same machine 2. external storage of graph data – Use redundancy to reduce disk I/O caused by cross page access – Reordering vertices such that vertex with the continuous orders are grouped in the same page Generalized – Reordering – Redundancy Consider a vertex u with degree k, in the worst case, there will be k remote access or disk I/Os By copying u to each machines on which its remote neighbor reside, we can avoid such remote access or disk I/Os.

3 Rationality Why ordering instead of partitioning? – Ordering provides more information than partitioning Ordering implies partitioning – Suppose a vertex sequence V1,V2,...Vm. We are to partition these vertices into P parts. A simple solution is to partition the sequence into P (consecutive) parts. – Disk access If two vertices (u,v) are logically close to each other, we hope it is arranged in close regions on disk so that we can reduce the disk seeking time when access v after u (or vice versa) Two vertices are said logically close to each other if they reside in the same densely connected subgraph – In general, vertices are processed in a certain order. We expect to process vertices in the above mentioned logical order

4 Problem definition 1 (p1) Overlapping graph partitioning Input: Given a graph G(V, E), an integer k, size constraint Z, Problem: Finding k Z-size subsets of V such that Objective: – G(.)=  e\in E f(e) is minimized – f(e)=0 if two ends of e occurs in a subset; 1 otherwise

5 Baseline model: Problem definition 2 (p2) Overlapping graph partitioning Input: Given a graph G(V, E), integer m Problem: Finding a sequence S(v1,…,vm) of V Objectives – Each v \in V appears at least once in the sequence –  e\in E f(e) is minimized – F(e(u,v)) is defined as the minimal distance between u and v in sequence Advantage – Compared to linearization, our model is independent on k, when a partition is smaller than k, linearization fails – if we find a solution under our model, a random partitioning is expect to be good enough (total number of cross-parts edges are minimized )

6 Relationship between p1 and p2 Under the model of problem 2, if we get an optimal solution, then – If and only if given a random partitioning P over the sequence, the E[G(P)] is optimal (minimal) ???

7 Solution to p2 How to generate order – Principle to guide the order generation Traverse vertices in the same community first then outside of the community – Bfs – Dfs How to select vertices to copy? – Quatify the benefit and the cost to copy a vertex? – Set low/upper bound on degree of copied vertex If the degree of a vertex is 1 or 2, it's obvious that we needn't store the information of this vertex multiple times; if the degree of a vertex is relatively low, the benefit of copying this vertex is low too If the degree of a vertex is too large, the cost of copying this vertex is large (i.e Maybe we can use that storage to give other few vertices copies in order to benefit more?)

8 17.10.2015 Bfs or Dfs? It seems that Bfs is not a good choice The distance between two neighboring vertices will be larger than that in Dfs sequence, especially in a graph with many vertices with large (relatively large) degree. If a back edge is found while doing dfs, should we copy the information of this vertex??? Or some other constraints is needed???

9 Solution to problem P1 Based on solution p2 Partition the sequence into consecutive parts by the size constraint Z.(The naiive solution)


Download ppt "Graph reordering/partitioning with redundancy. Motivation 1. distributed graph processing – Use redundancy to reduce the costly communication – Reordering."

Similar presentations


Ads by Google