Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567.

Similar presentations


Presentation on theme: "RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567."— Presentation transcript:

1 RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567

2 June 08, 2006DSRG, WPI2 Acknowledgements NSF for the financial support Joint work with several others Prof. Elke A. Rundensteiner Graduate students – Hong Su, Ming Li, Mingzhu Wei, Shoushen Wang, Jinhui Jian Undergraduate students – Drew Ditto, Bogomil Tselkov …

3 June 08, 2006DSRG, WPI3 Applications Need for efficient stream data processing Monitor patient data in real time Sensor networks – fire detection; battle field deployment; traffic congestion Others – news delivery, monitor network traffic, …

4 June 08, 2006DSRG, WPI4 No Calendar of French Impressionism by Monet $20 … Token-by-Token access manner timeline Pattern retrieval + Filtering + Restructuring for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e No XML Stream Processing

5 June 08, 2006DSRG, WPI5 Option 1: Automata-Based Pattern Retrieval for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 privacy 3 5 description 4 emph 2 auction 0 When patterns are retrieved depends on the data Additional Data Structures for Buffering Filtering Restructuring …

6 June 08, 2006DSRG, WPI6 Option 2: “DOM” Based Pattern Retrieval Navigate $a, /description/emph->$e Navigate $a, /privacy-> $p Tagger Select $e = “French Impressionism” Logic Plan Navigate-Index $a, /description/emph -> $e Select $e = “French Impressionism” Tagger Navigate-Scan $a, /privacy -> $p Physical Plan Choose low-level implementation alternatives for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e Navigate $a, /privacy->$p Rewrite by “pushing down selection” Navigate $a,/description/emph->$e Select $e=“French Impressionism” Tagger Rewritten Logic Plan When patterns are retrieved depends on other patterns

7 June 08, 2006DSRG, WPI7 Which paradigm is better? Minimal pushdown plans win over maximal pushdown when selectivity < 50%

8 June 08, 2006DSRG, WPI8 Problem How to provide the framework to choose between these paradigms? Model both paradigms uniformly as algebraic operators. Use a cost model to choose optimal plan given data statistics.

9 June 08, 2006DSRG, WPI9 Automaton as TokenNav StructuralJoin $a Extract $a TokenNav $s, /auctions/auction->$a for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 privacy 3 5 description 4 emph 2 auction 0 Select non-empty($b) Select $e=“French …” Extract $b Extract $e TokenNav $a, /privacy->$b TokenNav $a,/desc/emph->$e

10 June 08, 2006DSRG, WPI10 DOM Navigation as NodeNav Extract $a TokenNav $s, /auctions/auction->$a for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 2 auction 0 Select non-empty($b) Select $e=“French …” NodeNav $a, /privacy->$b NodeNav $a,/desc/emph->$e

11 June 08, 2006DSRG, WPI11 Exploring the Search Space A pattern can be retrieved inside the automaton or outside the automaton However there are dependencies for $a in …/a, $b in $a/…, $c in $b/… NodeNav for $b => NodeNav for $c TokenNav for $b => TokenNav/NodeNav for $c

12 June 08, 2006DSRG, WPI12 Run-time Optimization Statistics unknown before data arrives Statistics could change over time We need techniques for efficient statistics monitoring, search space exploration and plan migration (safe points for migration)

13 June 08, 2006DSRG, WPI13 Run-time Optimization Create an initial plan Run initial plan and collect statistics at same time Generate new plan using statistics collected Pause receiving stream Migrate to new plan Resume receiving stream Stream Query plan executor statistics Initial Query plan Query Optimizer New Query plan Plan Migrator New Query plan

14 June 08, 2006DSRG, WPI14 Executing a Raindrop Plan

15 June 08, 2006DSRG, WPI15 Key Ideas Minimum Memory requirements Discard data early Output data early

16 June 08, 2006DSRG, WPI16 In-Time Structural Join for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 privacy 3 5 description 4 emph 2 auction 0 StructuralJoin $a Extract $a TokenNav $s, /auctions/auction->$a Select non-empty($b) Select $e=“French …” Extract $b Extract $e TokenNav $a, /privacy->$b TokenNav $a,/desc/emph->$e

17 June 08, 2006DSRG, WPI17 Better than In-Time Structural Join StructuralJoin $r Extract $a TokenNav $s, /root->$r for $r in /root return $r/a $r/b 0 2 3 1 root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b “a” tokens need not be stored

18 June 08, 2006DSRG, WPI18 Evaluating Predicates StructuralJoin $r Extract $a TokenNav $s, /root->$r for $r in /root where $r/a = “value” return $r/b 0 2 3 1 root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b Once $a=“value” is satisfied, “b” tokens need not be stored Select $a=“value”

19 June 08, 2006DSRG, WPI19 Using schema knowledge StructuralJoin $a Extract $a TokenNav $s, /root->$r for $r in /root return $r/a $r/b 0 2 3 1 root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b “a”, “b” tokens need not be stored root -> (a*, b*)

20 June 08, 2006DSRG, WPI20 Using Schema Knowledge for Predicates StructuralJoin $r Extract $a TokenNav $s, /root->$r for $r in /root where $r/a = “value” return $r/b 0 2 3 1 root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b Once “c” is seen and $a=“value” is not yet satisfied, “b” tokens can be discarded Select $a=“value” root -> (b*, a*, c)

21 June 08, 2006DSRG, WPI21 Conclusions Raindrop integrates automaton and “DOM” navigation into one algebraic framework. Cost-based optimization possible. Execution minimizes memory requirements.

22 June 08, 2006DSRG, WPI22 Ongoing Work Load shedding in XML stream processing. Utilizing Dynamic schema changes for optimization.

23 June 08, 2006DSRG, WPI23 Fragment of XQuery supported FLWR expressions (no conditionals/user defined functions) Path expressions use only forward axes (child, descendant, descendant or self, attribute) Predicates supported are of the form: pathExpr relOp constant

24 June 08, 2006DSRG, WPI24 Issues with correlated queries for $r in /root return for $a in $r/a return $r/b

25 June 08, 2006DSRG, WPI25 Visit our XQuery engine over XML stream project (RAINDROP) website http://davis.wpi.edu/dsrg/raindrop/


Download ppt "RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567."

Similar presentations


Ads by Google