Download presentation

Presentation is loading. Please wait.

Published byRoman Carvell Modified about 1 year ago

1
Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao

2
Graph Reachability Query Given a directed graph G = (V, E) and two vertices u and v, u is said to reach v if there exists a path from u to v over G. Any directed graph can be easily transformed into a DAG trivial if u and v are in the same connect component Query( v 1, v 8 ) Reachable Query( v 2, v 11 ) Unreachable

3
The Issue and the Challenge ‘Big Data’ era brings us large graph with millions of nodes and edges. web-uk dataset: 133 million nodes, 5 billion edges DAG of web-uk: 22 million nodes, 38 million edges Traditional approaches are not applicable.

4
Related Work Recent works builds index, label( u ), offline for every node u. Label-Only Approach: answer Query( u, v ) only by label( u ) and label( v ) only Hop Labeling: TF-Label, Hierarchy Label, Distribution Label, … Transitive Closure Compression: Chain-Cover, Tree-Cover, … non-linear index construction time and index size, may generate unacceptable large index Label+ G Approach: answer Query( u, v ) by label( u ) and label( v ) with the possibility of accessing G if needed interval labeling: GRIPP, GRAIL, Ferrari, … linear index size, but may perform DFS

5
Main Idea of IP Labeling Both are time/space consuming if an exact answer is needed for large sets.

6
Main Idea of IP Labeling based on Min-wise Independent Permutation high probability guarantee to answer query linear index construction time and index size

7
Min-wise Independent Permutation

8

9
K -min-wise Independent Permutation We propose to use top-k smallest numbers instead of top-1 smallest number to improve the performance.

10
K -min-wise Independent Permutation

11
Independent Permutation Generation Knuth Shuffle

12
IP Label The IP label of u consists of two parts: L out ( u ): the min k { } set of Out( u ), min k {Out( u )} L in ( u ): the min k { } set of In( u ), min k {In( u )}

13
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = 5 {10}{4} {2, 10} {3} {8} {2, 3, 4, 10} {2, 10} {2, 3, 4, 8, 10}

14
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = L out (v 2 ) = {2, 3, 4, 8, 10} L out (v 7 ) = {1} Q 1 : Query (v 2, v 7 )

15
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = L out (v 3 ) = {1, 2, 3, 4, 6} L out (v 2 ) = {2, 3, 4, 8, 10} L in (v 2 ) = {7, 8} L in (v 3 ) = {6, 7} Q 2 : Query (v 3, v 2 )

16
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = Q 2 : Query (v 1, v 8 ) Need to Perform DFS Effect?

17
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = Q 2 : Query (v 1, v 3 )

18
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = Q 4 : Query (v 1, v 3 )

19
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = Q 4 : Query (v 1, v 3 )

20
IP Label Vertex L out L in v0v0 {0, 1, 2, 3, 4}{7} v1v1 {0, 1, 2, 3, 4}{11} v2v2 {2, 3, 4, 8, 10}{7, 8} v3v3 {1, 2, 3, 4, 6}{6, 7} v4v4 {2, 3, 4, 10}{3, 6, 7, 8, 11} v5v5 {0, 1, 5, 9, 10}{0, 7, 11} v6v6 {2, 10}{2, 3, 6, 7, 8} v7v7 {1}{0, 1, 6, 7, 11} v8v8 {10}{0, 2, 3, 6, 7} v9v9 {4}{3, 4, 6, 7, 8} v 10 {9}{0, 7, 9, 11} v 11 {5}{0, 5, 7, 11} for k = The probability increase significantly! Q 4 : Query (v 1, v 3 )

21
IP Label While DFS becomes deeper, it is much more likely to answer the unreachability queries, and therefore, it can stop in an early stage.

22
Two Optimizations Huge-Vertex Label: build additional index to handle the huge vertices of the graph Level Label: use the topological structure to prune the search space

23
Level Label

24
Huge-Vertex Label Vertex L hv VertexL hv v0v0 {0}v6v6 {0, 4} v1v1 v7v7 {0, 5} v2v2 {0}v8v8 {0, 5} v3v3 {0}v9v9 {0, 4} v4v4 v 10 {0, 5} v5v5 v 11 {0, 5}

25
Huge-Vertex Label Vertex L hv VertexL hv v0v0 {0}v6v6 {0, 4} v1v1 v7v7 {0, 5} v2v2 {0}v8v8 {0, 5} v3v3 {0}v9v9 {0, 4} v4v4 v 10 {0, 5} v5v5 v 11 {0, 5} Query (v 0, v 11 )

26
Huge-Vertex Label Vertex L hv VertexL hv v0v0 {0}v6v6 {0, 4} v1v1 v7v7 {0, 5} v2v2 {0}v8v8 {0, 5} v3v3 {0}v9v9 {0, 4} v4v4 v 10 {0, 5} v5v5 v 11 {0, 5} Query (v 0, v 1 )

27
Huge-Vertex Label Vertex L hv VertexL hv v0v0 {0}v6v6 {0, 4} v1v1 v7v7 {0, 5} v2v2 {0}v8v8 {0, 5} v3v3 {0}v9v9 {0, 4} v4v4 v 10 {0, 5} v5v5 v 11 {0, 5} Query (v 5, v 6 )

28
Performance Studies Real Dataset: Dataset|V(G)||E(G)|d avg R-ratio uniprotenc25M E-7 twitter18M E-2 web-uk22M38M E-1 citeseerx6.5M15M E-4 go-uniprot6.9M34M E-6 govwild8.0M23M E-5

29
Performance Studies Index Construction Time (in second) DatasetTF-LabelDLGRAILFerrariIP+ uniprotenc twitter web-uk citeseerx go-uniprot govwild

30
Performance Studies Query Time (in millisecond) DatasetTF-LabelDLGRAILFerrariIP+ uniprotenc twitter web-uk citeseerx go-uniprot govwild

31
Performance Studies

32
Distribution of the number of vertices visited

33
Conclusion We propose a new IP labeling approach, the first one to explore the randomness to answer reachability queries. Our new labeling approach has linear index construction time and index size. By independent permutation, the query performance is guaranteed by high probability. We analyze the performance of our proposed approach by extensive experimental studies and our approach shows both good efficiency and scalability.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google