Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/2010 1 Simulation Revised for Graph Pattern Matching.

Similar presentations


Presentation on theme: "Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/2010 1 Simulation Revised for Graph Pattern Matching."— Presentation transcript:

1 Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching

2 Yinghui Wu, LFCS DB talk Outline Graph Simulation label equality, edge-to-edge matching relation Bounded Simulation node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries query containment and minimization – cubic time query evaluation – cubic time Conclusion 2 A first step towards revising simulation for graph pattern matching

3 Yinghui Wu, LFCS DB talk Graph Pattern Matching: the problem Given a pattern graph P and a data graph G, decide whether G matches P, and if so, find all the matches of P in G. Applications social queries, social matching biology and chemistry network querying key work search, proximity search, … 3 Widely employed in a variety of emerging real life applications How to define?

4 Yinghui Wu, LFCS DB talk Graph Simulation Node label equivalence Edge-to-edge relation 4 Identical label matching, edge-to-edge relations Capable enough? A B D B v1v1v1v1 v2v2v2v2 E G A B DE P

5 Yinghui Wu, LFCS DB talk An example from real life social matching 5 Alice biologist doctors P G edge-to-path mappings Graph simulation is too restrictive!

6 Yinghui Wu, LFCS DB talk Bounded Simulation data graph G = (V, E, f A ) pattern graph P = (V p, E p, f v, f e ) G matches P via bounded simulation if there is a binary relation from V p to V that for every edge of P, there exists a path in G satisfying the constraints of the edge. bounded simulation v.s graph simulation node matches v.s label equality edge-to-path matching v.s edge-to-edge matching 6 Enriched model for capturing meaningful matches special case Id = ‘Alice’ Job = ‘biologist’ Job = ‘doctors’ P G Job = ‘biologist’ Job = ‘doctors’ Job = ‘CTO’ Id = ‘Alice’

7 Yinghui Wu, LFCS DB talk Basic results for the bounded simulation For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P. The graph pattern matching problem via bounded simulation can be solved in cubic time. The incremental bounded simulation problem Efficient approaches for graph pattern matching extension for multiple edge colors? 7

8 Yinghui Wu, LFCS DB talk Considering edge types… 8 Real life graphs have multiple edge types Essembly Network friends-allies friends-nemeses strangers-nemeses strangers-allies

9 Yinghui Wu, LFCS DB talk Querying Essembly network: an example 9 Essembly Network fa fn sn sa Alice Biologists supporting Cloning Doctors Against cloning fa <=2 sa <=2 fn P fa <=2 sn fa+ Pattern queries with multiple edge types

10 Yinghui Wu, LFCS DB talk Graph reachability and pattern queries Real life graphs usually bear different edge types… data graph G = (V, E, f A,, f C ) Reachability query (RQ) : (u 1, u 2, f u1, f u2, f e ) where f e is a subclass of regular expression of:  F ::= c | c ≤k | c + | FF Q r (G): set of node pairs (v 1, v 2 ) that there is a nonempty path from v 1 to v 2, and the edge colors on the path match the pattern specified by f e. 10 Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ fa <=2 fn

11 Yinghui Wu, LFCS DB talk Graph pattern queries 11  graph pattern queries PQ Q p =(V p, E p, f v, f e ) where for each edge e=(u,u’), Q e =(u 1, u 2, f v (u), f v (u’), f e (e)) is an RQ.  Q p (G) is the maximum set (e, S e ) for any e 1 (u 1,u 2 ) and e 2 (u 2,u 3 ), if (v 1,v 2 ) is in S e1, then there is a v 3 that (v 2,v 3 ) is in S e2. for any two edges e 1 (u 1,u 2 ) and e 2 (u 1,u 3 ), if (v 1,v 2 ) is in S e1, then there is a v 3 that (v 1,v 3 ) is in S e2  PQ vs. simulation and bounded simulation  search condition on query nodes  mapping edges to paths  constrain the edges on the path with a regular expression RQ and bounded simulation are special cases of PQ

12 Yinghui Wu, LFCS DB talk Reachability and graph pattern query: examples 12 fa fn sn sa Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ fa <=2 fn Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

13 Yinghui Wu, LFCS DB talk Fundamental problems: query containment  PQ Q 1 (V 1, E 1, f v1, f e1 ) is contained in Q 2 (V 2, E 2, f v2, f e2 ) if there exists a mapping λ from E 1 to E 2 s.t for any data graph G and e in E 1, S e is a subset of S λ(e), i.e., λ is a renaming function that Q 1 (G) is mapped to Q 2 (G).  Query containment and equivalence problems can all be determined in cubic time Query similarity based on a revision of graph simulation Determine the query similarity in cubic time 13 Query containment and equivalence for PQs can be solved efficiently

14 Yinghui Wu, LFCS DB talk query containment: example 14 B1B1 C1C1 Q1Q1 C3C3 C2C2 h <=1 h <=2 h <=3 B2B2 Q2Q2 C4C4 h <=1 B3B3 C5C5 Q3Q3 C6C6 h <=3

15 Yinghui Wu, LFCS DB talk Fundamental problems: query minimization Query minimization problem input: a PQ Q p output: a minimized PQ Q m equivalent to Q p Query minimization problem can be solved in cubic time. compute the maximum node equivalent classes based on a revision of graph simulation; determine the number of redundant nodes and edges based on the equivalent classes; Removed redundant and isolated nodes and edges 15 Query minimization for PQs can be solved efficiently

16 Yinghui Wu, LFCS DB talk query minimization: example 16 R B Q1Q1 B C f h <=2 g <=3 g CCC h <=2 g <=3 R B B f g CC h <=2 g <=3 h <=2 g <=3 R B B f g CC h <=2 g <=3 h <=2 Q2Q2 Q3Q3

17 Yinghui Wu, LFCS DB talk Evaluating graph pattern queries 17 PQ can be answered in cubic time. Join-based Algorithm JoinMatch  Matrix index vs distance cache  join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order) Split-based Algorithm SplitMatch  blocks: treating pattern node and data node uniformly  partition-relation pair Graph pattern matching can be solved in polynomial time

18 Yinghui Wu, LFCS DB talk Example of JoinMatch 18 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

19 Yinghui Wu, LFCS DB talk Example of JoinMatch 19 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

20 Yinghui Wu, LFCS DB talk Example of JoinMatch 20 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

21 Yinghui Wu, LFCS DB talk Example of JoinMatch 21 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

22 Yinghui Wu, LFCS DB talk Experimental results – effectiveness of PQs 22 Effectiveness of PQs: edge to path relations

23 Yinghui Wu, LFCS DB talk Experimental results – querying real life graphs 23 Evaluation algorithms are sensitive to pattern edges Varying |Vp|Varying |Ep|

24 Yinghui Wu, LFCS DB talk Experimental results – querying real life graphs 24 The algorithms are sensitive to the number of predicates Varying |pred|Varying b

25 Yinghui Wu, LFCS DB talk Experimental results – querying synthetic graphs 25 The algorithms scale well over large synthetic graphs Varying |V| (x10 5 ) Varying b

26 Yinghui Wu, LFCS DB talk Experimental results – querying synthetic graphs 26 The algorithms scale well over large synthetic graphs Varying αVarying cr

27 Yinghui Wu, LFCS DB talk Conclusion Simulation revised for graph pattern matching Bounded Simulation  node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries  query containment and minimization – cubic time  query evaluation – cubic time Future work extending RQs and PQs by supporting general regular expressions incremental evaluation of RQs and PQs 27 Simulation revised for graph pattern matching

28 Yinghui Wu, LFCS DB talk 28 “ Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden) Terrorist Collaboration Network ( ) Thank you!


Download ppt "Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/2010 1 Simulation Revised for Graph Pattern Matching."

Similar presentations


Ads by Google