Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.

Similar presentations


Presentation on theme: "Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen."— Presentation transcript:

1 Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen

2 Outline XNo Outline XNo Confusing Syntax XNo Pseudocode Examples Results Abdullah Mueen2

3 XML as Data Graph Abdullah Mueen3 oid label(3) value(13) Non-tree edges: model IDREF relationships in the document

4 Some Notations node path: – 1.2.3.7.14 label path: – ROOT.metro.cultural.mu seum.name 1.2.3.7 matches ROOT.metro.cultural.mus eum 2.3.7 does not match metro.cultural.museum.n ame 7 and 6 both matches ROOT.etro.cultural.muse um k-path: – Label Path of length ≤ k Abdullah Mueen4

5 Path Expression Abdullah Mueen5 label matches with any label sequencing alterationrepetition optional  ROOT.metro.cultural.museum  6,7  ROOT.(-.-.-).name  12,14,16,19,22,24  ROOT.-*.hotel  All hotel nodes  ROOT.metro.neighborhoods.neighborhood. (-|-.-)?.(hotel|museum).name  12,14,16,19 http://saxon.sourceforge.net/saxon6.5.3/expressions.html http://www.w3.org/1999/09/ql/docs/xquery.html Xpath and other Query Languages that use Path Expressions

6 The Problem Given a graph G and a path expression P, what are the labels of the nodes that match with P. Possible Solution is to evaluate the path expression query using the data graph. But data graph can be Very Large to fit in the main memory and can be Very Large to search completely even if it fits. Abdullah Mueen6

7 Indexing Data Graph No Schema No Keys Only Structural Information is there which can be summarized by a smaller graph I(G). This summary graph serves as an Index for the whole data graph. Abdullah Mueen7

8 Indexing Data Graph : Example(1) Abdullah Mueen8 data graph G 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2} 12 13 14 15 11 R C B A D {2,4} {3}{3} {6}{6} index graph I(G) 17 C {5,7} 18 D {8,9} {1}{1} ext(17) = {5,7} ext(13) = {2,4} Extent Precise Index eg. DataGuide, 1-index

9 Indexing Data Graph : Example(2) Abdullah Mueen9 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph G R.A.-*.C = {5,7} R.-.-*.B = {4} 12 13 14 15 11 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} index graph I(G) R.A.-*.C = {3,5,7} R.-.-*.B = {2,4} Safe Index

10 Indexing Data Graph : Example(3) Abdullah Mueen10 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph G R.A.-*.C = {5,7} R.-.-*.B = {2} Unsafe Index 12 13 14 15 11 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} index graph I(G) R.A.-*.C = {3,5,7} R.-.-*.B = { }

11 Bisimilarity Abdullah Mueen11 data graph G 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B R.A.-*.C = {5,7} R.-.B = {4,2} 2,4 are bisimilar. 5,7 are bisimilar 8,9 are bisimilar 6,8 are Not bisimilar  ≈ b defines an equivalence class over the set of nodes in G  Needs O(m log n) time to find the partitions Two nodes u and v are called bisimilar (u ≈ b v) if 1.label(u) = label(v) 2.every incoming label path from ROOT to u matches with at least one incoming path from ROOT to v and vice versa.

12 Equivalence Class b → The 1-index Abdullah Mueen12 data graph G 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 12 13 14 15 11 R C B A D {2,4} {3}{3} {6}{6} index graph I(G) 17 C {5,7} 18 D {8,9} {1}{1} R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2}

13 Revisiting Bisimilarity Abdullah Mueen13 1-index is upper bounded by the size (number of nodes) of the data graph For real large documents it is almost 45% of the size of the data graph Bisimilarity partitions nodes by considering all incoming paths from ROOT which is a global comparison between nodes.

14 k-bisimilarity Abdullah Mueen14 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B Two nodes u and v are called k-bisimilar (u ≈ k v) if 1.label(u) = label(v) 2.every incoming label path of length≤k to u matches with at least one incoming path of length≤k to v and vice versa. 2,4 are 0-bisimilar. 5,7 are 1-bisimilar 8,9 are 2-bisimilar 6,8 are 1-bisimilar  ≈ k defines an equivalence class over the set of nodes in G  The algorithm for computing k- bisimulation will be shown later

15 Equivalence Class 0 → A(0) index Abdullah Mueen15 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph Gindex graph A(0) 12 13 14 15 11 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} Label grouping / Label partition

16 Equivalence Class 1 → A(1) index Abdullah Mueen16 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph Gindex graph A(1) 15 12 13 16 14 17 11 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9}

17 A(k) index family Abdullah Mueen17 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1818 1515 1212 1313 1616 1414 1717 1 R D C B A D C B {8,9} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 1515 1212 1313 1616 1414 1717 1 R D C B A D C B {8}{8} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 D {9}{9} 1212 1313 1414 1515 1 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} A(0)A(1) A(2)A(3) = 1-index data graph G 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9}

18 Properties of A(k) index Abdullah Mueen18 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} A(1)

19 Properties of A(k) index Abdullah Mueen19 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} A(1)

20 How to compute A(1) index Abdullah Mueen20 {1} {2,4} {3,5,7} {6,8,9} 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B {1} {2} {4} {3,5,7} {6,8,9} {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} 1-bisimilar partition {1} {2,4} {3,5,7} {6,8,9} Label partition Lookup: Refining:

21 How to compute A(2) index Abdullah Mueen21 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6} {8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6} {8,9} 2-bisimilar partition {1} {2} {4} {3} {5,7} {6,8,9} 1-bisimilar partition Lookup: Refining:

22 Query Evaluation : Fwd or Bckwd Abdullah Mueen22 R.A.-*.C = {5,7} 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} RA C -  Repeated state is prevented  O(|A|*m)  Backward evaluation using label-group

23 Query Evaluation : Validation Abdullah Mueen23 R.A.B.C.D = {6,8,9} 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} RA C B  Repeated state is prevented  O(|A|*m) D

24 Avoiding Validation Abdullah Mueen24 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} R.-*.C.D= {6,8,9} A(1) For Queries like R.-*.p, we can safely avoid validation on A(k) if p is a k-path.

25 Results Abdullah Mueen25

26 Results Abdullah Mueen26

27 Conclusion A(k) index is smaller than precise indexes and have their advantages, such as faster execution time with significant accuracy. Future presentations – Change of the indexes with updates. – Incorporating more complex queries. Abdullah Mueen27


Download ppt "Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen."

Similar presentations


Ads by Google