Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.

Similar presentations


Presentation on theme: "1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003."— Presentation transcript:

1 1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

2 2 Need for Stream Processing New environment  Data source is everywhere  Data request is everywhere New applications  Sensor networks  Analysis of XML web logs  Selective dissemination of XML information (e.g., news) New features  On-line arriving data  Potentially unstable data  Real-time response requirement  Scalability requirement

3 3 Specific Challenges for XML Streams Pattern retrieval on nested data + filtering/restructuring FOR $b in doc (bib.xml) //book LET $p := $b/price $t := $b/title WHERE $p > 50 Return $t TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 … Token-by-Token access manner timeline TCP/IP Illustrated … A token: can be an open tag/close tag/PCDATA is not a direct counterpart of a tuple

4 4 Observations and Questions Observations  Pattern retrieval->The Automata model is long studied for pattern retrieval on tokens  Filtering/Structuring->The Algebraic model is long studied for optimizing query plan on tuples Questions  How to integrate the two models?  How to optimize a query within the integrated query model?

5 5 Uniform Modeling in an Algebraic Framework

6 6 A Running Example Give me book titles whose price is greater than 50: FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 Languages and Machines Sudkamp T. Addison-Wesley 39.95 … TCP/IP Illustrated … timeline TCP/IP Illustrated Stevens … … Input XML stream

7 7 Automata Computation: NFAs + Buffers FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title 1 book * 2 4 title 3 price TCP/IP Illustrated 65.96 Buffer for title Buffer for price t0 t1 t2 t3 t4 t5 t6 t7 TCP/IP Illustrated 65.95 … input active states+0+1+1,2+1,4-1,4+1,3…… stack[0] [1] [0] [1] [1,2] [0] [1] [1,2] [1,4] [0] [1] [1,2] [0] [1] [1,2] [1,3] …… No materialization needed Multiple patterns resolved in one pass

8 8 Algebraic Computation FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title Extract //book Navigate //book, price Select price > 50 Tagger Navigate //book, title book title author last first publisherprice Text Selection push- down enabled

9 9 The Raindrop Approach  Uniform Automata computation modeled in an algebraic manner  Tight-coupling Automata and regular tuple-based computation interchangeable

10 10 Path Bindings in XQuery FOR $b in doc (bib.xml) //book LET $p := $b/price, $t := $b/title WHERE $p > 50 RETURN $t FLWR expression: FOR…LET...WHERE…RETURN… Path bindingsFiltering and restructuring “The purpose of path bindings is to produce a tuple stream in which each tuple consists of one or more bound variables” [W3C]

11 11 Data Flow Automata plan Regular algebraic plan Tuple stream XML data stream Query answer

12 12 Modeling the Automata Plan: Black Box[xscan] vs. White Box Automata Plan Q1 := //book Q2 := //book/price Q3 := //book/title SJoin //book Extract //book/price Extract //book/title Black Box White Box

13 13 A Unified Process at the Logical View Select //book/price >5 0 Navigate //book, //book/title SJoin //book Extract //book/price Extract //book/title

14 The Algebra Core OpSymbolSemantic Selection Filter tuples based on the predicate pred Projection Filter columns in the input tuples based on the variable list v Join Join input tuples based on the predicate pred Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags Navigate Take input elements of path p1 and output ancestor elements of path p2 Extract Identify elements of path p from the input stream Structural Join Join input tuples on their structural relationship, e.g, the common parent relationship p Relatinal- like XML- Specific

15 15 The Extract Operator 12 book * Extract //book/title TCP/IP Illustrated … … 1 title TCP/IP Illustrated Data on the Web Advanced Programming in the Unix environment

16 16 The Structural Join Operator 12 book 3 title * 4 price Extract //book/title Extract //book/price SJoin //book FOR $b in doc (bib.xml) //book LET $p := $b/price, $t := $b/title WHERE $p > 50 RETURN $t … TCP/IP Illustrated … … … Tight coupling …

17 17 The Navigate Operator TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 … … … … … … … … … Navigate //book, title

18 18 Optimization

19 19 In or Out? Automata plan Regular algebraic plan Tuple stream XML data stream Query answer Pattern retrieval

20 Pattern Retrieval Alternatives … …</price … TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 … … … … … … … … … … … … In Automata (/title, /price) Out of Automata(/title, /price) 1 book * 2 4 title 3 price 1 book * 2

21 21 Plan Alternatives 1 Extract //book * Navigate //book, price 2 book Select price >5 0 Navigate //book, title The pull-out plan Extract //book/price 1 3 4 title price Extract //book/title * SJoin //book 2 book Select //book/price >50 The push-in plan Tagger

22 22 Experiment 1:

23 23 Experiment 2

24 24 Camp 1: Complete Automata Model [XSQ, XSM, XPush] All details on the same level  Hard to understand  Not suitable for optimizing at different levels Little studied for using automata as query processing paradigm For $x in $R/a return for $Y in $X/b return $Y, $X 0,0,0 1,0,0 2,1,0 2,2,1 2,2,2 2,1,3 1,1,3 1,2,2 1,2,1 1,1,0 *r=er|r++ *r=sr|r++ *r!= |r++ *r= |w(x,sx),w(x, ), r++,x”++ *r= |w(x, ), w(x,ex),r++,xs=x *r!= &*r!= | w(x,*r),r++,x”++ *r= |w(x, ),r++ * true|xm=x’, w(o, ),w(o, ),x’++ *r!= &*r!= |w(x,*r),w(o,*r), x”++,r++ *r= |w(x, ),w(o, ),r++,x”++ !AE(x’)&*x’!=ex| w(o,*x’),x’++ AE(x’)&*r!= |w(x,*r), w(o,*r),r++,x”++ AE(x’)&*r= |w(x, ),w(o, ),w(x,ex),r++,x’++ !AE(x’)&x’!=ex|w(o,*x’),x’++ !AE(x”)&x”= |w(o, ),x”++ !AE(x”)&*x”!= | w(o,*x”),x”++ True|xm=x’,w(o, ), w(o, ),x’++ !AE(x”)&*x”= |x”++ !AE(x”)&*x”!= &*x”!=ex|x”++ !AE(x”)&*x” =ex|xs=x”

25 25 Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter] Fixed interface for automata computation (all pattern retrieval pushed down)  No opportunity of pushing/pulling computation into/from automata Bloated, black box operator  Algebraic rewriting impossible for internal optimization Automata Plan $b := //book $p := //book/price $t := //book/title $b$p$t

26 26 Contribution Automata and algebra modeled into one framework allowing a uniform logical view Opportunity of push-into-automata and pull-out of- automata provided via query rewriting Optimization necessity verified by experiments

27 27 http://davis.wpi.edu/dsrg/raindrop/


Download ppt "1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003."

Similar presentations


Ads by Google