Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

Similar presentations


Presentation on theme: "1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)"— Presentation transcript:

1 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford) CIKM’2005

2 2 Motivation for $a in //article[year = “2005” or keyword = “XML”] for $s in $a/section return $s/title In an index-based method, 7 tags and text elements need to be verified to process this query Running time is dominated by the I/O for manipulating this cursors Twig join Algorithms are not optimized for I/O and do not exploit the query’s extraction points article AND ORsection title year 2005 keyword XML

3 3 Our Contributions 1. TwigOptimal, a new holistic twig join algorithm that supports a large fraction of XQuery (including AND/OR branches) 2. Description of how extraction points improve query performance 3. Experimental evaluation that shows how TwigOptimal outperforms current algorithms

4 4 Agenda Background TwigOptimal algorithm Experimental results Conclusions

5 5 XML Indexing Begin/End/Level encoding Begin: preorder position of tag/text End: preorder position of last descendent Level: depth Containment: X contains Y iff X.begin < Y.begin <= X.end (assuming well-formed) A1A1 B1B1 B2B2 C1C1 D1D1 B3B3 C2C2 R (0,7,0) (1,5,1) (2,2,2) (4,4,3) (5,5,3) (6,7,1) (7,7,2) (3,5,2)

6 6 Basic Access Path Inverted lists Posting: Token = Location = Supported method on cursor: C B.fowardTo(Position p) A1A1 B1B1 B2B2 C1C1 D1D1 B3B3 C2C2 R B1B1 B2B2 B3B3 C1C1 C2C2

7 7 Joins in XML Structural (Containment) Joins Twig Joins A || B A || B || C D B || C B || D A || B || C

8 8 LocateExtension “Extension” (w.r.t. query node q) – a solution for the subquery rooted at q Input: q Result: the cursors of all descendants of q point to an extension for q A || B || C D B1B1 C1C1 X1X1 X2X2 D2D2 B3B3 D1D1 A C2C2

9 9 LocateExtension While (not end(q) && not hasExtension(q)) { (p, c) = PickBrokenEdge(q); ZigZagJoin(p, c); } A || B || C D B1B1 C1C1 X1X1 X2X2 D2D2 B3B3 D1D1 A C2C2

10 10 TwigOptimal Algorithm Tests if the cursor with the minimal location has an extension If not, try to virtually move cursors until they form an extension Only move cursors physically if no more virtual move is possible A virtual move just sets the begin value of the cursor, therefore no I/O is involved: Cq.begin = new begin value for Cq; Cq.virtual = true; //indicates that the cursor is virtual

11 11 Checking Extension We have an extension for cursor q if: All cursors underneath q are properly aligned All cursors underneath q have physical locations A || B || C D B1B1 C1C1 X1X1 X2X2 D2D2 B3B3 D1D1 A C2C2 Return false

12 12 Checking Extension We have an extension for cursor q if: All cursors underneath q are properly aligned All cursors underneath q have physical locations A || B || C D B1B1 C1C1 X1X1 X2X2 D2D2 B3B3 D1D1 A C2C2 Return true

13 13 Moving Cursors Two passes over the query tree Bottom-up: move each parent cursor forward so it contains the children cursors Top-down: move the children cursors forward so they are contained by their parents

14 14 Move Cursors Example x2x2 y4y4 y5y5 y1y1 x1x1 z2z2 z1z1 y2y2 y3y3 1 3 2 4 5 6 7 = virtual move Query = //x[.//y and.//z] = physical move

15 15 Comparing with TSGeneric+ w1w1 x1x1 w2w2 x2x2 y2y2 y3…y3…y 50 y 51 y 52... y 100 z2z2 x 50 y 49 y 98 x3x3 x 4...x 49 = current cursor position Query = //w//x//y//z = virtual move = physical move y1y1 z1z1 y 99

16 16 Comparing with TSGeneric+ x2x2 y2y2 y 50 y 51 y 52... y 49 y 98 x3x3 x 4...x 49 = current cursor position Query = //w//x//y//z = physical move w1w1 x1x1 y1y1 z1z1 y3…y3… w2w2 y 100 z2z2 x 50 y 99

17 17 Extraction Points Optimization If neither q or its descendants in the query are extraction points we can virtually move these cursors within q’s parent C1C1 B1B1 A1A1 C 99 || B C A C 100 A2A2 B2B2 B3B3

18 18 Prototype Implemented over Berkeley DB B-tree Inverted lists Posting: Token = Location = Position is BEL

19 19 Data Sets Xmark 10 documents of size ~ 100MB each Synthetic 4 tags: W, X, Y, Z Uncorrelated, no self-nesting Same frequency

20 20 Experimental Results

21 21 Experimental Results

22 22 Experimental Results

23 23 Experimental Results

24 24 Experimental Results

25 25 Conclusion TwigOptimal algorithm outperforms existing twig join algorithms by more than 40%, especially for larger queries Optimized for I/O, which is the performance bottleneck Extraction points optimization improve performance


Download ppt "1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)"

Similar presentations


Ads by Google