Presentation is loading. Please wait.

Presentation is loading. Please wait.

Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen.

Similar presentations


Presentation on theme: "Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen."— Presentation transcript:

1 Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen

2 XML as Graph Data Abdullah Mueen2 oid label(3) Non-tree edges: model IDREF relationships in the document Leaf nodes with attributes are suppressed

3 Branching Path Expression Abdullah Mueen3 ROOT/metro/neighborhoods/neighborhood [/business=>cinema-hall]/cultural=>museum

4 Example (1) Abdullah Mueen4 //hotel[/star][ museum[\art]]]

5 Covering Index A covering index can answer any query from a set of queries without consulting with the original document. The GOAL of this paper is to find a covering index for “Branching Path Queries”. Abdullah Mueen5

6 k-bisimilarity Abdullah Mueen6 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B Two nodes u and v are called k-bisimilar (u ≈ k v) if 1.label(u) = label(v) 2.every incoming label path of length≤k to u matches with at least one incoming path of length≤k to v and vice versa. 2,4 are 0-bisimilar. 5,7 are 1-bisimilar 8,9 are 2-bisimilar 6,8 are 1-bisimilar  ≈ k defines an equivalence class over the set of nodes in G  The algorithm for computing k- bisimulation will be shown later

7 1-index : Covering Index for Simple Path Expression Abdullah Mueen7 7 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 1818 1515 1212 1313 1616 1414 1717 1 R D C B A D C B {8,9} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 1515 1212 1313 1616 1414 1717 1 R D C B A D C B {8}{8} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 D {9}{9} 1212 1313 1414 1515 1 R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} A(0)A(1) A(2)A(3) = 1-index data graph G 1515 1212 1313 1616 1414 1717 1 R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} SuccStable

8 Inverse edges Abdullah Mueen8 8 4 1 2 5 7 3 6 9 0 R D C B A D DC C B 8 4 1 2 5 7 3 6 9 0 R D C B A D DC C B 5,7 are not 1-bisimilar 5,7 are 1-bisimilar

9 The F&B index Abdullah Mueen9 While there is no change – Reverse all edges – Compute Forward Bismilarity Partition – Reverse all edges again. – Compute Backward Bisimilarity Partition

10 Forward Bisimulation Abdullah Mueen10 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B 8 4 1 2 5 7 3 6 9 0 R D C B A D D C C B

11 Backword Bisimulation Abdullah Mueen11 8 4 1 2 5 7 3 6 9 0 R D C B A D DC C B 8 4 1 2 5 7 3 6 9 0 R D C B A D DC C B 8 4 1 2 5 7 3 6 9 0 R D C B A D DC C B

12 Properties of F&B index The F&B index over a data graph G covers all branching path expression. F&B index is the smallest of the indexes that covers branching path queries. Generally F&B is large for most of the real documents. Abdullah Mueen12

13 1. Tags to be indexed There are tags that are not used for Queries. bold, emph We specify set of tags to be indexed. In a 100MB document, the F&B index on all tags has 436,000 nodes while ignoring formatting tags it has 18,000 nodes. Abdullah Mueen13

14 2. IDREF edges to be indexed IDREF edges are not counted in // operation. IDREF edges are explicitly described in the path expression by => operator. We specify the Set of IDREF edges to be indexed. The 100MB document has 1.3 million nodes with all IDREF edges while it has 18,000 nodes without any IDREF edges and formatting tags. Abdullah Mueen14

15 3. Exploiting Local Similarity Long Queries are not frequent and interesting. If we restrict the length of the possible queries, we can get much smaller index tree than the F&B index. We specify the length of the local path by using k-bisimilarity instead of bisimilarity while computing the F&B index. Abdullah Mueen15

16 4. Restricting Tree Depth Long nested conditions are less likely to occur. We specify the maximum depth of the conditional path expression by tree-depth (defined next). Abdullah Mueen16

17 tree depth Abdullah Mueen17 //museums/history/museum[/featured and museum[\art]]]

18 Definition of an Index A set of tags T Set of IDREF edges on both directions ref fwd and ref bwd Two parameters k bwd and k fwd to restrict the length of the path queries One parameter td to restrict the depth of the nested conditional expression. Abdullah Mueen18

19 The BPCI index Abdullah Mueen19 Remove all tags not in T such that the removal does not cut out a tag in T. Start with label grouping as current partition P For i=0 and i≤td – Reverse all edges in G, retain IDREF edges only in ref fwd. – P ← Forward k fwd -Bismilar Partition of P and inc(i) – Reverse all edges in G again, retain IDREF edges only in ref bwd. – P ← Backward k bwd -Bisimilar Partition of P and inc(i)

20 Variations of BPCI Abdullah Mueen20

21 Testing if an Index covers a Query Build the Query graph Check if all tags and IDREF edges in the query are in T and in (ref bwd U ref fwd ) Check if the tree depth of the query is less than td of the index Check if all paths in the query with even tree depth have length < k bwd Check if all paths in the query with odd tree depth have length < k fwd Abdullah Mueen21

22 Result on Xmark benchmark Abdullah Mueen22 1.I all is the F&B index 2.I allmost-all is F&B with k fwd = 1 3.I specific is built on the query

23 Result Abdullah Mueen23

24 Conlclusion BPCI is the covering index for Branching Path Queries. By setting appropriate parameters, we can get a wide range of queries suitable for various applications. Extensions – Updating and Bulk loading – Integration with value indexes Abdullah Mueen24


Download ppt "Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen."

Similar presentations


Ads by Google