Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.

Similar presentations


Presentation on theme: "1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute."— Presentation transcript:

1 1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute XSDM Workshop, 2006 Supported by USA National Science Foundation

2 2 What’s Special for XML Streams Token-by-Token access manner timeline Token: not a counterpart of a self-contained tuple Pattern Retrieval on Token Streams Jack, Brooks Q1: for $a in stream(“persons”)//person return $a, $a//name

3 3 Running Example D1: 1 2 3 Jack, Brooks 4 5 6 7 8 9 10 Amy 11 12 D1 : not recursiveD2 : recursive Q1: for $a in stream(“persons”)//person return $a, $a//name D2: 1 2 Jack, Brooks 4 5 6 7 Will, Brooks 9 10 11 12

4 4 Retrieving Patterns Using Automata Q2: for $a in stream(“persons”) /person return $a, $a/name s0 person s1 name s2 How to process “/” pattern retrieval in automata? How to process “//” pattern retrieval in automata? λ s0s1 person s2 λ s3 name s4 Automata of Q1 and its stack s0s0 s 1, s 2 s0s0 s0s0 s 1, s 3,s 4 s 1, s 2 s0s0 s 1, s 3,s 4 Jack s 1, s 2 s0s0 Q1: for $a in stream(“persons”)//person return $a, $a//name *

5 5 Raindrop Algebra Plan Stream data op1 op5 λ s0s1 person s2 λ s3 name s4 op2 op4 op3 ExtractUnnest $a Navigate //person->$a Navigate $a//name->$b ExtractNest $b StructuralJoin $a … … … Note that structural join (in-time structural join) only perform Cartesian products! The person element will be purged after generating output!

6 6 Problems with Recursion Stream data op1 op5 λ s0s1 person s2 λ s3 name s4 op2 op4 op3 ExtractUnnest $a Navigate //person->$a Navigate $a//name->$b ExtractNest $b StructuralJoin $a … … … … D2: 1 2 Jack, Brooks 4 5 6 7 Will, Brooks 9 10 11 12 After the second person and name and joined, we can’t get the correct result for the first person. op3

7 7 Goals How to correctly process recursive data and recursive queries? How to guarantee that data is output as early as possible? When data is non-recursive, how to make the cost of the plan as cheap as possible?

8 8 Recursive-Mode Operators Each operator has recursive mode operator Associate IDs with elements Each element is associated with a triple (startID, endID, level) Given two elements and the corresponding triples, we can determine ancestor-descendent and parent-child relationships. 1 2 Jack 4 5 6 7 Amy 9 10 11 12 1, 12, 1 2, 4, 2

9 9 Features of Recursive Navigate Operators Keep track of the triple for each element. Call structural join only when all triples in Navigate operator are complete. 1, -,12, -,22, 4, 2 7, 9, 4 1 2 Jack 4 5 6 7 Amy 9 10 11 12 Navigate //person->$a Navigate $a//name->b Navigate $a//name->b Token1Token2Token 9 6, 10, 3 Navigate //person->$a 12 Token12 1,, 1

10 10 Features of Recursive Extract Operators ExtractUnnest Compose the tokens into tuples Associate ID information for each corresponding element ExtractNest Collect the tokens and creates one tuple for the whole collection. Move the groupby functionality to the top structural join

11 11 Changes of Structural Join In-time structural join Do Cartesian product ID based Structural Join Change from In-time structural join to ID-based-comparison method ID-based-comparison condition: (a.startID < b.startID && b.endID < a.endID && [b.level = a.level +1]) (a.startID < b.startID && b.endID < a.endID) Structural Join $a a b1 b2 a, b1 a, b2 Structural Join $a ExtractUnnest $a 2, 4, 2 7, 9, 4 1, 12, 1 a1, b1 ExtractUnnest $b 6, 10, 3 a1, b2 a2, b2 Valid for parent – child relationship a1 a2 b1 b2

12 12 Structural Join Invoking Issue Invoking strategy: structural join will be invoked only when all the triples are complete. Structural Join $a ExtractUnnest$a 2, 4, 2 7, 9,4 1, 12, 1 a1, b1 ExtractUnnest$b 6, 10, 3 a1, b2 a2, b2 a1 a2 b1 b2 clean

13 13 Another Query With ExtractNest Operators StructuralJoin $x ExtractNest$yExtractNest $z Navigate //a -> $x Stream data Navigate $x //c->$z Q3: for $x in //a return $x//b, $x//c ExtractNest = ExtractUnnest + GroupBy Navigate $x//b-> $y a abc bbc (1,14 ) (2, 9)(10,11)(12,13 ) (3,4)(5,6)(7,8)

14 14 Process ExtractNest GroupBy Structural Join $x ExtractNest$y Navigate //a -> $x Navigate$x//b-> $y Stream data 3, 4 5, 6 10, 11 3 3 2 c, 7, 8 c,12,13 3 2 1 2a,2, 9 a, 1,14 c1 c1, c2 b2, b3 b1, b2, b3Push GroupBy Up Navigate$x//c-> $z ExtractNest$z Q2: for $x in //a return $x//b, $x//c It is better to do groupby in structural join here! b1 b2 b3 c1 c2 a1 a2

15 15 Further Optimization Using context-aware structural join Context Check Automata Recursive Structural Join In-time Structural Join Output tuples Purge tuples Navigate Data is recursive Data is not recursive Run-time switching from id-based structural join to the efficient in-time- structural join strategy.

16 16 Plan Optimization with Multiple Structural Joins f or $a in stream (“s”)//a return { for $b in $a//b return { for $c in $b//c return {$c//d, %c//e, $c//f }, $b//f }, $a//g } StructuralJoin $a ExtractNest $g StructuralJoin $b StructuralJoin $c Navigate$a//g - >$g ExtractNest $f op1 ExtractNest $d ExtractNest $e Navigate $//a ->$a Navigate$a//b ->$b Navigate$b//c ->$c Navigate $c//d - >$d Navigate $c//e ->$e Navigate $b//f ->$f op2 op3 Goal: Try to generate as many non-recursive operators as possible. Traverse the query plan in a top-down manner. When a structural join that corresponds to a path expression with “//” is encountered, we instantiate this structural join and its descendents as recursive mode operators.

17 17 Experiments Advantages of early invocation of structural join Context-aware structural join VS recursive structural join

18 18 Recursion-free Mode VS Recursive Mode

19 19 Related work Stack-Tree-Anc[AJK02] Use stack to store the chain of ancestor candidates Can be combined to our system Transducer-based XML query processor[LPY02] FSA without stack are not sufficient for handling recursion. YFilter: NFA-based path navigation [DF03] Do not guarantee that the structural join is processed at first possible moment

20 20 Conclusions Propose a new class of stream operators for recursive XQuery stream processing Propose a context-aware structural join Use cheaper algebra operators whenever possible in plan generation Illustrate performance benefits with little overhead in experiments

21 21 http://davis.wpi.edu/dsrg/raindrop/ samanwei@cs.wpi.edu

22 22


Download ppt "1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute."

Similar presentations


Ads by Google