Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.

Similar presentations


Presentation on theme: "Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology."— Presentation transcript:

1 Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology 3 University of California, Santa Barbara 1

2 outline Querying distributed real-life graphs Real-life graphs are often fragmented/distributed Distributed reachability queries Distributed bounded reachability queries Distributed regular reachability queries Distributed reachability with MapReduce Experimental study Conclusion Distributed query evaluation with performance guarantees 2 Partial Evaluation Yinghui Wu VLDB 2012

3 Distributed Real-life Graphs Real life graphs are distributed Geo-distributed, e.g., data centers Decentralization, e.g., social networks Distributed entity and personal information 3 Yinghui Wu VLDB 2012 Real-life graphs are purposely or naturally distributed

4 Distributed querying methods Federated/centralized graph database collect and link graph fragments query the centralized graph 4 construction and maintenance cost centralized querying Q Q(G) fragments... Yinghui Wu VLDB 2012

5 Distributed querying methods Graph exploration strategy Master node and slave node Predefined graph partition and query execution plan 5 no bounds on visit numbers and data shipment master node Q Q(G) query planintermediate results slave node... Yinghui Wu VLDB 2012

6 Querying a distributed social network 6 (DB* ∪ HR*) Ann, "CTO" Mark, "FA" Fred, "HR" Walt, "HR" Dan,"DB" Bill,"DB" Mark,"FA" Pat,"SE" Tom,"AI" Ross,"HR" Jack,"MK" Ben,"MK" Emmy,"HR" Mat,"HR" Q DC 1 DC 2 DC 3 Yinghui Wu VLDB 2012 Using partial evaluation to obtain performance guarantees centralized method?Graph exploration?

7 Yinghui Wu VLDB 2012 Partial evaluation Partial evaluation (a.k.a program specialization) given a function f(s,d) and a part of input e.g., s, specializes f(s,d) w.r.t s only conducts the part of f’s computation that depends on s generates a residual function f’ Partial evaluation: generating partial answer f (s, d)f’ (d) s Q (F i, G) Fi Q’ (G) for graph queries? 7

8 Distributed graphs and graph queries 8 Distributed graph graph fragmentation F = (F, G f ) fragment graph G f Reachability query reachability query Qr(s,t) bounded reachability query Qbr(s,t,l) regular reachability (path) query Qrr(s,t,R)  R::= ε| a | RR | R ∪ R | R* Fred, "HR" Walt, "HR" Bill,"DB" Pat,"SE" Tom,"AI" Ross,"HR " Jack,"MK" Emmy,"HR" Mat,"HR " F1F1 F2F2 F3F3 GfGf fragment a virtual node of F 1 an in-node of F 1 a cross edge Ann, "CTO" Mark, "FA" Q r (Ann, Mark) Ann, "CTO" Mark, "FA" Q br (Ann, Mark, 5) (DB* ∪ HR*) Ann, "CTO" Mark, "FA" Q rr (Ann, Mark, (DB* ∪ HR*)) 5 Yinghui Wu VLDB 2012

9 Distributed graph querying framework 9 Applying partial evaluation to graph querying coordinator Sc Q Q(G) fragments... coordinating site Sc and a set of graph fragments F1, …, Fn distributing at Sc: post Q to fragments local evaluation: partially evaluate Q Assembling at Sc QQQ Q Q(Fi) Yinghui Wu VLDB 2012

10 Distributed reachability queries 10 Yinghui Wu VLDB 2012 Performance guarantees: Over a fragmentation F = (F, G f ) of a graph G, reachability queries can be evaluated (a) in O(|V f ||F m |) time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(|V f | 2 ), where G f = (V f, E f ) and F m is the largest fragment in F. A distributed reachability evaluation algorithm DisReach Coordinator Sc posts qr(s,t) to each fragment site in F Each site locally evaluates qr(s,t) in parallel, and produces partial answer as a set of Boolean equations Sc collects and assembles the partial results

11 Distributed reachability: partial evaluation 11 Partial evaluation by introducing Boolean variables Yinghui Wu VLDB 2012 Local evaluate each qr(v,t) on Fi in parallel: for each in-node v’ in Fi, decides if v’ reaches t; introduce a Boolean variable to each v’ Partial answer to qr(v,t): a set of Boolean formula, disjunction of variables of v’ to which v can reach qr(v,t) v t v’ t qr(v,v’) X v ’ = qr(v’,t) = X v1 ’ or … or X vn ’

12 Distributed reachability: assembling 12 Partial evaluation by introducing Boolean variables Yinghui Wu VLDB 2012 Collect the Boolean equation set at coordinator Sc solve a Boolean equation system over a dependency graph qr(s,t) is true iff Xs = true at Sc X v = X v’’ or X v’ X v’’ = false Xt = 1 X v ’ = Xt Xs = Xv O(|V f |)

13 QQ Q Q Q Yinghui Wu VLDB 2012 1. Dispatch Q to fragments (at Sc) 2. Partial evaluation: generating Boolean equations (at Fi) 3. Assembling: solving equation system (at Sc) Distributed reachability queries: example Sc Jack,"MK" Emmy,"HR" Mat,"HR " F2F2 Fred, "HR" Walt, "HR" Bill,"DB " F1F1 Pat,"SE" Tom,"AI" Ross,"HR " F3F3 13 Ann Mark

14 QQ Q Q Q Yinghui Wu VLDB 2012 1. Dispatch Q to fragments (at Sc) 2. Partial evaluation: generating equations (at Fi) 3. Assembling: solving equation system (at Sc) Distributed bounded reachability queries Sc Jack,"MK" Emmy,"HR" Mat,"HR " F2F2 Fred, "HR" Walt, "HR" Bill,"DB " F1F1 Pat,"SE" Tom,"AI" Ross,"HR " F3F3 Variables denoting numeric values 15 Ann Mark A weighted dependency graph

15 Distributed bounded reachability queries Performance guarantees: bounded reachability queries can be evaluated with the same performance guarantees as for reachability queries. 16 Yinghui Wu VLDB 2012 each site is visited once O(|F|) number of visits O(|Vf| 2 ) independent of |G| total network traffic O(|Vf||Fm|) parallel computation computational cost Any strategy (i.e., indexes, graph partitions) can be applied on fragment Local evaluation Performance guarantees for distributed bounded reachability

16 Distributed regular reachability queries Performance guarantees: Over a fragmentation F = (F, G f ) of a graph G, regular reachability queries qrr(s, t, R) can be evaluated (a) in O((|V f | 2 +|F m |)|R| 2 ) time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(R| 2 |V f | 2 ), where G f = (V f, E f ) and F m is the largest fragment in F. Query automaton Gq(R) of R: 17 Yinghui Wu VLDB 2012 Automaton representation for queries

17 Query automaton A node v is a match of state u v in G q (R) iff (1) they have the same label, and (2) there is a path ρ from v to t and a path ρ’ from u v to u t, s.t. ρ and ρ’ induce the same label Given a graph G, q rr (s, t, R) over G is true if and only if s is a match of u s in G q (R) 18 Yinghui Wu VLDB 2012 Fred, "HR" Walt, "HR" Mark,"FA" Tom,"AI" Ross,"HR" Emmy,"HR" Mat,"HR" Q Ann FA DBHR

18 Distributed regular query evaluation: algorithm 19 Yinghui Wu VLDB 2012 Query distribution construct query automaton Gq(R) post Gq(R) to each graph fragment Partial evaluation compute partial answers as a set of Boolean formula vectors assembling collect partial answers and construct dependency graph assembling final answer

19 f 21 f 22 … f 2k Distributed regular query evaluation: partial evaluation 20 Yinghui Wu VLDB 2012 Partial evaluation by introducing Boolean variables For each node v in Fi, assign v. rvec: a vector of O(|Vq|) Boolean formulas, each entry v.rvec[u] denotes if v matches u introduce a Boolean variable X(v’,w) to each virtual node v’ of Fi and a state w in Vq, denoting if v’ matches w Partial answer to qrr(s,t): a set of Boolean formula from each in- nodes of Fi v1v1 t v’ t vqvq wqwq … v2v2 f 11 f 12 … f 1k f 1v’ f 2v’ … f kv’ q rr X(v’,w)

20 Distributed regular query evaluation: assembling 21 Yinghui Wu VLDB 2012 Partial evaluation by introducing Boolean variables Collects partial results as set of Boolean formulas Constructs a dependency graph: a node v d for each in-node and each entry of its formula vector, labeled with Boolean formula and an edge for dependencies Checks the reachability of v d (s, u s ) can reach v d (t, u t ) in the dependency graph v1v1 t v’ t vqvq wqwq … v2v2 q rr f 11 f 12 … f 1k vd(v 1, v q ) vd(v’,w) vd(v 2,v q ) vd(t,u t )=true vd(s, u s )

21 Yinghui Wu VLDB 2012 Distributed Regular Reachability Evaluation: Example 22 QQ Q Q Q 1. Dispatch Q to fragments (at Sc) 2. Partial evaluation: generating a set of Boolean equations (at Fi) 3. Assembling: solving equation system (at Sc) Sc Jack,"MK" Emmy,"HR" Mat,"HR " F2F2 Fred, "HR" Walt, "HR" Bill,"DB " F1F1 Pat,"SE" Tom,"AI" Ross,"HR " F3F3 Test reachability in dependency graph distributed regular reachability query evaluation vector of Boolean formulas

22 Yinghui Wu VLDB 2012 Distributed Reachability with MapReduce Partial evaluation properly fits in MapReduce framework … coordinator mapper 1mapper mmapper k … reducer 1. generates query automata Gq; partition graph G to K fragments (as a key/value pair) (i, ) 2. Map function: local evaluation upon (i, ) and generates 3. Reduce function: assembles collected partial results and writes to distributed file system. 1, k, 1, rvset k 1, rvset 1 O(F m ) O(|R| 2 |Vf | 2 ) Processing path O(F m ) + |R| 2 |Vf | 2 ) 24

23 Experimental Evaluation Experimental setting Real-life datasets Synthetic data: larger random graphs following densification law Algorithms:  disReach, disReachn and disReachm  disDist and disDistn  disRPQ, disRPQn and disRPQd  MRdRPQ 25 Yinghui Wu VLDB 2012

24 26 Distributed reachability Efficiency and scalability 20% and 6%9% of disReachn three thousand visits over 4 fragments disReach outperforms centralized and message-passing approaches

25 Distributed regular reachability Efficiency and network traffic 27 Yinghui Wu VLDB 2012 Time: 60% of disRPQnTraffic: at most 25% and 3% disRPQ takes much less time and communication cost

26 Distributed regular reachability (cont.) Scalability 28 Yinghui Wu VLDB 2012 Scales well with the number of fragments; takes less time over more fragments disRPQ scales well over the number of fragments

27 Performance of MapReduce implementation Efficiency and Scalability 29 Yinghui Wu VLDB 2012 scales well with the size of fragments Takes more time over more complex queries Takes less time with more mappers Partial evaluation works well in MapReduce model

28 Conclusion Distributed reachability querying Partial evaluation based distributed evaluation Reachability, bounded reachability and regular reachability queries Performance guarantees Partial evaluation can be naturally conducted as MapReduce Future work Distributed evaluation for other queries, e.g., graph pattern matching using simulation Combining partial evaluation and incremental computation 30 Partial evaluation based distributed query evaluation Yinghui Wu VLDB 2012

29 29 Thank you! Performance Guarantees for Distributed Reachability Queries


Download ppt "Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology."

Similar presentations


Ads by Google