Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Similar presentations


Presentation on theme: "Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA."— Presentation transcript:

1 Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA VLDB 2005

2 Schema-Based Query Optimization (SQO) Schema knowledge can be utilized to optimize queries Well studied in deductive/relational databases  Join elimination  predicate elimination,  detection of empty answer set … Equally applicable to XML for flat value filtering

3 SQO for XML Pattern Retrieval General XML SQO  Applicable to both static and streaming XML  E.g..: Query tree minimization [Amer-Yahia+02] Static XML Specific SQO  Focus on expediting random access of data  E.g.: Query rewrite using “extents” (indices built on element types) [Fernandez+98], … Stream specific XML SQO  Focus on expediting token-by-token sequential access of data

4 Stream Specific SQO Example /seller[shipTo] Without schema Buffer seller element Retrieve /shipTo Buffer seller element Retrieve /shipTo Retrieve /sameAddr … … buffer: When retrieved Skip computation

5 Related Work YFilter [Diao02] and XSM [Ludscher 03]  Use schema to decide whether pattern results are recursive or types of child elements  Essentially propose general XML SQO FluXQuery [Koch+04]  Use schema to minimize buffer size  Is complementary to our focus (aim to skip unnecessary computations) SIX [Gupta+03]  Use indices interleaved with XML data to reduce parsing  Could be combined with our techniques

6 Challenge: Constraint Useful? /seller/shipTo Retrieve /shipTo Retrieve /sameAddr When retrieved Nothing to save: /shipTo is the only pattern retrieval /seller[shipTo]/billTo Retrieve /shipTo Retrieve /sameAddr When retrieved Retrieve /billTo Nothing to save: /billTo has already been retrieved

7 Challenge : Benefits/Overhead? Maximal benefits: no beneficial optimization should be missed  Any failed patterns should be detected as early as possible Minimal overhead: no redundant optimization should be introduced  Whether a particular pattern fails should not be repeatedly checked

8 Challenge: Plan Execution Optimization at lower level than query rewrite Specific physical implementations are needed /seller[shipTo] Buffer seller element Retrieve /shipTo Retrieve /sameAddr When retrieved No query can capture this optimization

9 Outline SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

10 Physical Implementation of Pattern Retrieval Note:  Important to understand physical stream engine implementation for designing effective SQO Our implementation:  Widely used automata implementation [e.g., Tukwila, YFilter]

11 Example Query and its Automata 012 9 1112 auctionsauction shipTo seller primary, secondary phone 3 λ 10 … for $a in /auctions/auction, $b in $a/seller[shipTo] where $b/*/phone=“508-123-4567” return for $c in $a/item where $c//keyword=“auto” return $b/*/phone * … … input [2,3] [1] [0] [1] [0] stack [12#] [11] … [2,3] [1] [0] … … [11] … [2,3] [1] [0] #: buffering flag

12 Example Query and its Automata 012 9 1112 auctionsauction shipTo seller primary, secondary phone 3 λ 10 … * … … input [2,3] [1] [0] [1] [0] stack [12#] [11] … [2,3] [1] [0] … … [11] … [2,3] [1] [0] #: buffering flag Opt. opportunities: 1.avoid transitions as much as possible 2.revoke buffering flag as soon as possible

13 Is Constraint Useful for Opt.? Constraints used to find “ending marks” of a pattern within a context element is ending mark of /shipTo within seller element context

14 Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier:

15 Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Ending mark for $a/seller is not helpful for $a in /auctions/auction, $b in $a/seller … + Ending mark for $a/seller is helpful

16 Is Constraint Useful for Opt.? Ending mark for $a/seller is not helpful for $a in /auctions/auction, $b in $a/seller … + Ending mark for $a/seller is helpful Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Pattern is required

17 Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Pattern is required for $c in $a/item return $a/category <!element item (category?, desc, …)> + Ending mark for $a/category is not helpful for $c in $a/item[category] return $a/category Ending mark for $a/category is helpful

18 Is Constraint Useful for Opt.? Ending mark helpful if  Context element can be filtered out earlier: Pattern may fail to appear Pattern is required and  The early filtering can be beneficial: Transitions may happen after ending marks Buffering flags may be raised before ending marks

19 SQO Design Helpful ending marks identified by our SQO Three SQO rules designed using  Occurrence constraints  Exclusive constraints  Order constraints

20 Example SQO Rule Use occurrence constraint Event-condition-action output by rule for $a in /auctions/auction, $b in $a/seller Where $b/*/phone = “508-1234567” … + Event: second is encountered in a seller Condition: $b/*/phone = “508- 1234567” not satisfied yet Action: skip rest computations within current seller element

21 Outline SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

22 Properties of SQO Application Maximal benefits Minimal overhead

23 Maximal Benefit  Definition of “rule independence”  Proof of “maximal benefits” given If rules are all independent, as long as each rule is applied on each pattern, maximal benefits are ensured

24 Minimal Overhead: Redundancy Same pattern redundancy : Multiple ending marks adopted for same pattern for $a in /auctions/auction, $b in $a/seller[shipTo] … Query Schema Constraints Ending mark for $b/shipTo guarantees to capture failure of /shipTo Ending mark for $b/shipTo Redundant

25 Minimal Overhead: Redundancy? Parent-child pattern redundancy: ending marks of child patterns early filter parent pattern for $a in /auctions/auction, $b in $a/seller[shipTo] … optional QueryConstraints for $b/shipTo for $a/seller required Can be used to capture failure of $a/seller[shipTo] Redundant

26 SQO Application Algorithm Input:  XQuery represented as a tree  XML Schema represented as a graph Processing:  Query tree traversed top-down “maximal benefits” ensured  Tree node applied by local/regional appliers Same pattern redundancy excluded by local applier Parent-child pattern redundancy excluded by regional applier Output:  Event-condition-actions attached to tree nodes

27 Outline SQO Technique Design Guideline SQO Application Execution of Optimized Plan Experimentations

28 Encoding ECAs in Automata E: push-in or pop-out of state C: pattern result buffer checked A: actions include:  Suspend computations by removing automata transitions  Clean up result generated within current context element  Prepare for recovering computation for next context element (e.g., backup transitions)

29 Example: ECAs in Automata 012 9 5 auctions auction shipTo item seller 3 10 13 sameAddr (1, startTag, none,state 2) … Event: 1 st encountered Condition: none Action: cut all transitions from 1.q2 2.States reachable via : q3 3.States between q2 and q13: q9 … primary, secondary 1112 phone (…, state 3) <sameAddr> </sameAddr> <item> </item> <primary> </primary> … for $a in /auctions/auction, $b in $a/seller[shipTo] where $b/*/phone=“508-123-4567” return for $c in $a/item …

30 Outline SQO technique design guideline SQO application Execution of optimized plan Experimentations

31 Optimization Effected by ? How often pattern fails (pattern selectivity) How much gain each early filtering brings (unit gain)

32 Necessity of Design Guideline Selectivity of Pattern with the Only Useful Ending Mark Plan without SQO Plan with SQO (1 ending mark) Plan with SQO but no guideline considered (30 ending marks)

33 Conclusion First SQL on streaming XML Support SQO on nested XQuery with “*” or “//” Offer criteria of “useful” constraints Ensure maximal benefits and minimal overhead in SQO application Provide execution strategy in widely-used automata- based model Implement SQO optimizer in Raindrop system (VLDB’04 demo) Experimentally demonstrate SQO brings significant improvement with little overhead

34 Visit our XQuery engine over XML stream project (RAINDROP) website http://davis.wpi.edu/dsrg/raindrop/ Supported by USA National Science Foundation and IBM PhD Fellowship


Download ppt "Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA."

Similar presentations


Ads by Google