Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen

Outline XNo Outline XNo Confusing Syntax XNo Pseudocode Examples Results Abdullah Mueen2

XML as Data Graph Abdullah Mueen3 oid label(3) value(13) Non-tree edges: model IDREF relationships in the document

Some Notations node path: – 1.2.3.7.14 label path: – ROOT.metro.cultural.mu seum.name 1.2.3.7 matches ROOT.metro.cultural.mus eum 2.3.7 does not match metro.cultural.museum.n ame 7 and 6 both matches ROOT.etro.cultural.muse um k-path: – Label Path of length ≤ k Abdullah Mueen4

Path Expression Abdullah Mueen5 label matches with any label sequencing alterationrepetition optional  ROOT.metro.cultural.museum  6,7  ROOT.(-.-.-).name  12,14,16,19,22,24  ROOT.-*.hotel  All hotel nodes  ROOT.metro.neighborhoods.neighborhood. (-|-.-)?.(hotel|museum).name  12,14,16,19 http://saxon.sourceforge.net/saxon6.5.3/expressions.html http://www.w3.org/1999/09/ql/docs/xquery.html Xpath and other Query Languages that use Path Expressions

The Problem Given a graph G and a path expression P, what are the labels of the nodes that match with P. Possible Solution is to evaluate the path expression query using the data graph. But data graph can be Very Large to fit in the main memory and can be Very Large to search completely even if it fits. Abdullah Mueen6

Indexing Data Graph No Schema No Keys Only Structural Information is there which can be summarized by a smaller graph I(G). This summary graph serves as an Index for the whole data graph. Abdullah Mueen7

Indexing Data Graph : Example(1) Abdullah Mueen8 data graph G 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2} 12 13 14 15 11 R C B A D {2,4} {3}{3} {6}{6} index graph I(G) 17 C {5,7} 18 D {8,9} {1}{1} ext(17) = {5,7} ext(13) = {2,4} Extent Precise Index eg. DataGuide, 1-index

Indexing Data Graph : Example(2) Abdullah Mueen9 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph G R.A.-*.C = {5,7} R.-.-*.B = {4} 12 13 14 15 11 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} index graph I(G) R.A.-*.C = {3,5,7} R.-.-*.B = {2,4} Safe Index

Indexing Data Graph : Example(3) Abdullah Mueen10 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph G R.A.-*.C = {5,7} R.-.-*.B = {2} Unsafe Index 12 13 14 15 11 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} index graph I(G) R.A.-*.C = {3,5,7} R.-.-*.B = { }

Bisimilarity Abdullah Mueen11 data graph G 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B R.A.-*.C = {5,7} R.-.B = {4,2} 2,4 are bisimilar. 5,7 are bisimilar 8,9 are bisimilar 6,8 are Not bisimilar  ≈ b defines an equivalence class over the set of nodes in G  Needs O(m log n) time to find the partitions Two nodes u and v are called bisimilar (u ≈ b v) if 1.label(u) = label(v) 2.every incoming label path from ROOT to u matches with at least one incoming path from ROOT to v and vice versa.

Equivalence Class b → The 1-index Abdullah Mueen12 data graph G 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 12 13 14 15 11 R C B A D {2,4} {3}{3} {6}{6} index graph I(G) 17 C {5,7} 18 D {8,9} {1}{1} R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2}

Revisiting Bisimilarity Abdullah Mueen13 1-index is upper bounded by the size (number of nodes) of the data graph For real large documents it is almost 45% of the size of the data graph Bisimilarity partitions nodes by considering all incoming paths from ROOT which is a global comparison between nodes.

k-bisimilarity Abdullah Mueen14 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B Two nodes u and v are called k-bisimilar (u ≈ k v) if 1.label(u) = label(v) 2.every incoming label path of length≤k to u matches with at least one incoming path of length≤k to v and vice versa. 2,4 are 0-bisimilar. 5,7 are 1-bisimilar 8,9 are 2-bisimilar 6,8 are 1-bisimilar  ≈ k defines an equivalence class over the set of nodes in G  The algorithm for computing k- bisimulation will be shown later

Equivalence Class 0 → A(0) index Abdullah Mueen15 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph Gindex graph A(0) 12 13 14 15 11 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} Label grouping / Label partition

Equivalence Class 1 → A(1) index Abdullah Mueen16 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B data graph Gindex graph A(1) 15 12 13 16 14 17 11 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9}

A(k) index family Abdullah Mueen17 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1818 1515 1212 1313 1616 1414 1717 1 R D C B A D C B {8,9} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 1515 1212 1313 1616 1414 1717 1 R D C B A D C B {8}{8} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 D {9}{9} 1212 1313 1414 1515 1 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} A(0)A(1) A(2)A(3) = 1-index data graph G 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9}

Properties of A(k) index Abdullah Mueen18 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} A(1)

Properties of A(k) index Abdullah Mueen19 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} A(1)

How to compute A(1) index Abdullah Mueen20 {1} {2,4} {3,5,7} {6,8,9} 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B {1} {2} {4} {3,5,7} {6,8,9} {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} 1-bisimilar partition {1} {2,4} {3,5,7} {6,8,9} Label partition Lookup: Refining:

How to compute A(2) index Abdullah Mueen21 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6} {8,9} {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6} {8,9} 2-bisimilar partition {1} {2} {4} {3} {5,7} {6,8,9} 1-bisimilar partition Lookup: Refining:

Query Evaluation : Fwd or Bckwd Abdullah Mueen22 R.A.-*.C = {5,7} 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} RA C -  Repeated state is prevented  O(|A|*m)  Backward evaluation using label-group

Query Evaluation : Validation Abdullah Mueen23 R.A.B.C.D = {6,8,9} 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} RA C B  Repeated state is prevented  O(|A|*m) D

Avoiding Validation Abdullah Mueen24 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} R.-*.C.D= {6,8,9} A(1) For Queries like R.-*.p, we can safely avoid validation on A(k) if p is a k-path.

Results Abdullah Mueen25

Results Abdullah Mueen26

Conclusion A(k) index is smaller than precise indexes and have their advantages, such as faster execution time with significant accuracy. Future presentations – Change of the indexes with updates. – Incorporating more complex queries. Abdullah Mueen27

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.

Similar presentations

Presentation on theme: "Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.

Similar presentations

Presentation on theme: "Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen."— Presentation transcript:

Similar presentations

About project

Feedback