Presentation is loading. Please wait.

Presentation is loading. Please wait.

Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.

Similar presentations


Presentation on theme: "Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute."— Presentation transcript:

1 Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004

2 Stream Processing data sources data requesters Networks

3 What’s Special for XML Stream Processing Token-by-Token access manner timeline Pattern retrieval + Filtering + Restructuring FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return $b, $c Token: not a counterpart of a self-contained tuple Pattern Retrieval on Token Streams

4 Two Computation Paradigms Automata-based [yfilter, xscan, xsm, xsq, xpush…] Algebraic [niagara00, …] FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return $b, $c 1 auction * 2 3 seller bidder Automata 8 Navigate $a, /seller->$b Navigate $a, /bidder-> $c Tagger Algebra Navigate stream(bids),//auction->$a 4 homepage 9 sameAddr 56 * phone … 7 bid

5 Comparison of Two Paradigms Either paradigm has deficiencies Both paradigms complement each other Automata ParadigmAlgebra Paradigm Good for pattern retrieval on tokensDoes not support token inputs Need patches for filtering and restructuring Good for filtering and restructuring Present all details on same low levelSupport multiple descriptive levels (e.g., logical plan, physical plan) Little studied as query processing paradigm Well studied as query process paradigm

6 Four-Level Algebraic Framework Semantics-Focused Plan Stream Physical Plan Stream Execution Plan Express the semantics of query regardless of input sources Accommodate tokenized streams/ automata computation Describe implementation details of operators Decide how an operator is invoked (scheduling) Abstraction Level High (Declarative) Low (Procedural) Stream Logic Plan This Raindrop framework intends to integrate both paradigms into one

7 Level I: Semantics-Focused Plan Express query semantics regardless of stored or stream input sources [Rainbow- ZPR02] Reuse existing general optimization techniques  Decorrelation  Cancel duplicate navigation operators  …

8 Stream Data: … … … source … source … $a … … … source … $a … $b … … … … source … $a … $b … $c … …... … NavUnnest stream(bids),//auction->$a NavUnnest $a, /seller ->$b NavUnnest $a, /bid/bidder ->$c Example Semantics-Focused Plan Plan and Input/output Data: Query: … FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” Return $b, $c

9 Level II: Stream Logical Plan Extend semantics-focused plan to accommodate tokenized stream inputs  New input data format: Tokens  New operators: StreamSource, TokenNavigate, ExtractUnnest, ExtractNest, StructuralJoin  New rewrite rules: Push-into/Pull-out-of Automata

10 One Uniform Algebraic View Token-based plan (automata plan) Tuple-based plan Tuple stream XML data stream Query answer Algebraic Stream Logical Plan

11 Modeling Automata in Algebraic Plan: Black Box[XScan01] vs. White Box $a := stream(bids)//auction $b := $a/seller $c := $a/bid/bidder Black Box XScan StructuralJoin $a ExtractUnnest $a, $b ExtractUnnest $a, $c White Box TokenNavigate $a, /seller->$b TokenNavigate $a, /bid/bidder->$c TokenNavigate stream(bids), //auction->$a FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr] WHERE $b/*/phone = “508” Return $b, $c

12 Data Model in Algebraic Plan Modeling Automata StructuralJoin $a ExtractUnnest $a, $b ExtractUnnest $a, $c TokenNavigate $a, /seller->$b TokenNavigate $a, /bid/bidder->$c TokenNavigate stream(bids), //auction->$a … … … 0314 … … … ……... … ….... StreamSource

13 For Details of Levels III and IV, please refer to “ Automaton Meets Query Algebra: Towards a Unified Model for XQuery Evaluation over XML Data Streams ”, ER 2003 “ Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams ”, CIKM 2003 “ Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams ”, Journal Submission 2004

14 Optimization I: Computation Into or Out of Automata? TokenNavigate $a, /bid/bidder->$c ExtractUnnest $a, $c ExtractUnnest $a, $b StructuralJoin $a TokenNavigate $a, /seller->$b TokenNavigate stream(bids), //auction->$a ExtracUnnest stream(bids), $a NavigateUnnest $a, /seller->$b NavigateUnnest $a, /bid/bidder->$c TokenNavigate stream(bids), //auction->$a NavUnnest stream(bids), //auction->$a NavigateUnnest $a, /seller ->$b NavigateUnest $a, /bid/bidder ->$c Out of AutomataInto Automata Automata Plan … ……

15 Experimentation Results

16 Optimization II: Semantic Query Optimization General schema-based optimizations  Eliminate predicate/join, …  Focus on operators manipulating flat values XML specific schema-based optimizations  Focus on pattern retrieval  Fall into two categories General XML SQO Minimize query tree [YCL+-AT&T 01] Stream XML SQO (our focus)

17 Stream-Specific XML SQO Observations  Pattern retrieval over tokens solely relies on document-order traversal  Schema constraints help expedite document-order traversal State-of-the-Art  [XPush03] covers limited query (boolean XPath match) and one type of constraints Our goals:  Support more powerful query (XQuery)  Support more types of constraints (XSchema)

18 Step I: Construct Query Graph (a) Example Query (b) Query Tree FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr] WHERE $b/*/phone = “508” Return $b, $c

19 Example XML Schema

20 Step II: Apply Optimization Rules Offer optimization rules utilizing  occurrence constraints  exclusive constraints  order constraints Apply rules in an order ensuring  no beneficial rule missed  no redundant rule introduced

21 Step III: Translate Rewritten Query Graph Back to Plan (I) when is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3 Utilize Occurrence Constraints

22 Step III: Translate Rewritten Query Graph Back to Plan (II) when or is encountered once: suspend states s2 and s9 Utilize Exclusive Constraints

23 Step III: Translate Rewritten Query Graph Back to Plan (III) when is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilize Order Constraints

24 Thank WPI DSRG Rainbow Team for XAT Algebra Support

25

26

27

28


Download ppt "Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute."

Similar presentations


Ads by Google