Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.

Similar presentations


Presentation on theme: "A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003."— Presentation transcript:

1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

2 Need for Stream Processing New environment  Data sources are everywhere  Data requests are everywhere New applications  Sensor networks  Analysis of XML web logs  Selective dissemination of XML information (e.g., news)

3 Specific Challenges for XML Streams Dream Catcher King S. Bt Bound 20 … Token-by-Token access manner timeline Dream Catcher … Token: not a direct counterpart of a tuple Pattern retrieval + Filtering/Restructuring FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t

4 Two Computation Paradigms Automata-based [yfilter02, x-scan01, xsm02, xsq03, xpush03…] Algebraic [niagara00, …] This project intends to integrate both paradigms into one

5 Automata Paradigm: FOR $b in stream(biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN $t 1 book * 2 4 title 3 price 5 Text() Auxiliary structures for: 1.Buffering data 2.Evaluating predicates 3.Restructuring buffered data … //book //book/title //book/price/text()

6 Algebraic Computation book title author last first publisherprice Text Navigate //book, price Tagger Navigate //book, title Select price < 30 Navigate //book, price Select price < 30 Tagger Navigate //book, title Selection push-down enabled FOR $b in doc (biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN $t … … … … Navigate //book, /title

7 Observations Automata paradigm  Good and long studied for pattern retrieval on tokens  Patches needed for complex filtering and restructuring Algebraic paradigm  Good and long studied for expressing and optimizing query plans on sets of tuples  Tokenized inputs not accommodated yet Either paradigm has deficiencies Both patterns complement each other

8 Research Challenges How to integrate the two models? How to optimize a query within the integrated query model?

9 Raindrop Approach: Uniform Modeling in an Algebraic Framework

10 Uniform Algebraic Plan XML data stream Query answer Algebraic Plan

11 Uniform Algebraic Plan Token-based plan (automata plan) Tuple-based plan Tuple stream XML data stream Query answer

12 Modeling the Automata in Algebraic Plan: Black Box[xscan] vs. White Box $b := //book $p := $b/price $t := $b/title SJoin //book Extract //book/price Extract //book/title Black Box White Box Xscan FOR $b in stream(biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN $t

13 A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t Token-based plan (automata plan) Tuple-based plan

14 A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t Tuple-based plan SJoin //book Extract $p, //book/price Extract $t, //book/title

15 A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t SJoin //book Extract //book/price Extract //book/title Select //book/price >5 0 Navigate //book, //book/title

16 The Algebra Core OpSymbolSemantic Selection Filter tuples based on the predicate pred Projection Filter columns in the input tuples based on the variable list v Join Join input tuples based on the predicate pred Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags Navigate Take input elements of path p1 and output ancestor elements of path p2 Extract Identify elements of path p from the input stream Structural Join Join input tuples on their structural relationship, e.g, the common parent relationship p Relational -like XML- Specific SJ

17 Extract Operator 12 book * Extract //book/title Dream Catcher … … 1 title Dream Catcher

18 Structural Join Operator 12 book 3 title * 4 price Extract //book/title Extract //book/price SJoin //book … Dream Catcher … … … FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t

19 Optimization via Query Rewriting

20 In or Out? Token-based plan (automata plan) Tuple-based Plan Tuple stream XML data stream Query answer Pattern retrieval

21 Plan Alternatives Extract //book Navigate /price Select price<30 Navigate book/title The pull-out plan Extract //book/price Extract //book/title SJoin //book Select price < 30 The push-in plan Tagger

22 Pattern Retrieval Alternatives … In Automata (/title, /price) 1 book * 2 4 title 3 price … … Dream Catcher King S. Bt Bound 20 … … … … … … … Out of Automata(/title, /price) 1 book * 2 t2 t10 t2 t10 SJ

23 Experiment: Selectivity = 5%Selectivity = 90%

24 Related Work

25 Camp 1: Complete Automata Model [XSQ, XSM, XPush] For $x in $R/a return for $Y in $X/b return $Y, $X 0,0,0 1,0,0 2,1,0 2,2,1 2,2,2 2,1,3 1,1,3 1,2,2 1,2,1 1,1,0 *r=er|r++ *r=sr|r++ *r!= |r++ *r= |w(x,sx),w(x, ), r++,x”++ *r= |w(x, ), w(x,ex),r++,xs=x *r!= &*r!= | w(x,*r),r++,x”++ *r= |w(x, ),r++ * true|xm=x’, w(o, ),w(o, ),x’++ *r!= &*r!= |w(x,*r),w(o,*r), x”++,r++ *r= |w(x, ),w(o, ),r++,x”++ !AE(x’)&*x’!=ex| w(o,*x’),x’++ AE(x’)&*r!= |w(x,*r), w(o,*r),r++,x”++ AE(x’)&*r= |w(x, ),w(o, ),w(x,ex),r++,x’++ !AE(x’)&x’!=ex|w(o,*x’),x’++ !AE(x”)&x”= |w(o, ),x”++ !AE(x”)&*x”!= | w(o,*x”),x”++ True|xm=x’,w(o, ), w(o, ),x’++ !AE(x”)&*x”= |x”++ !AE(x”)&*x”!= &*x”!=ex|x”++ !AE(x”)&*x” =ex|xs=x”

26 Camp 1: Complete Automata Model [XSQ, XSM, XPush] All details are presented on the same level (and low level!)  Hard to understand  Not suitable for optimizing at different levels Little has been studied for using automata as query processing paradigm

27 Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter] Fixed interface for automata computation (all pattern retrieval pushed down)  No opportunity of pushing/pulling computation into/from automata Bloated, black box operator  Algebraic rewriting impossible for internal optimization Automata Plan $b := //book $p := //book/price $t := //book/title $b$p$t

28 Contributions Combining automata and algebra leads to a powerful query processing model  Modeling: Uniform, simple logical view – better understandability  Optimization: Uniform rewriting – more optimization opportunities (e.g., pushin/pullout) Optimization necessity is verified by experiments

29 Email: suhong@cs.wpi.edu

30 Experiment 2 Number of patterns = 2Number of patterns = 20


Download ppt "A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003."

Similar presentations


Ads by Google