Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

Similar presentations


Presentation on theme: "1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang."— Presentation transcript:

1 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

2 2 Outline XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

3 3 XAT Decorrelation XQuery is Correlated Query Decorrelation is required for Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

4 4 Three kinds of Decorrelation Simple Decorrelation No Additional sources No Aggregate Functions Complex Decorrelation with Additional Sources Complex Decorrelation with Aggregate Functions

5 5 TCP/IP Illustrated 65.95 TCP/IP Illustrated 65.95 Data on the Web 34.95 Data on the Web 39.95 Example* of XML Use Cases.

6 6 Simple Query Example T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t] ):col1 { for $t in distinct (document("prices.xml") /book/title) return $t } In the document "prices.xml", find the book title.

7 7 Simple Decorrelation Linear the Tree: T[FOR(CB, T2[])[T1[S1]]]  T[T2[T1[S1]]] T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t] ):col1 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 Agg() T ( [$t] ):col1

8 8 Is Simple Decorrelation Right? Every operator, except Groupby, has the semantic of “for each” tuple in the input table. Hence, the FOR operator can be omitted in the simple decorrelation scenario.

9 9 Two types of Navigates Navigate Unnesting:  U Unnesting the parent-children relationship, and duplicates the parent values for each child. Navigate Collection:  C Nesting the parent-children relationship, create a collection of children, but keep the single parent.

10 10 Where to use two types Navigate Unnesting:  U FOR binding. Navigate Collection:  C LET binding.

11 11 Complex Query Example  c ($b, price):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col4] ):col1 { for $t in distinct (document("prices.xml") /book/title), let $b := document( “ prices.xml") /book [title = $t] return $t, $b/price } In the document "prices.xml", find the book title and its prices. S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3

12 12 Complex Decorrelation with Additional Source  : T[FOR(CB, T2[S2])[T1[S1]]]  T[T2[  [T1[S1],S2]]]  c ($b, price):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3  C ($b, price):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg()

13 13 Full Query Example  c ($b, price/text()):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col5] ):col1 { for $t in distinct (document("prices.xml") /book/title), let $b := document( “ prices.xml") /book [title = $t] return $t, min($b/price/text()) } In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element. S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3 min(col4):col5

14 14 Complex Query Decorrelation with one Aggregation Function T[FOR(CB, T2[Agg(T3[])])[T1[S1]]]  T[  (DM(T1))[T1,T2[  (DM(T1),Agg(T3[  [Distinct(T1[S1]), S2]))]]] DM(T1) is data model computed from T1. S2 Agg() T1 S1 T3 FOR($rate) T2 T S1 Groupby(DM(T1), Agg())  S2 T3 T T2 T1  Distinct

15 15 The Query after Decorrelation  c ($b, price/text()):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col5] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3 min(col4):col5  C ($b, price/text()):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg() GB(DM, min(col4):col5) 

16 16 Where are we? XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

17 17 XAT Computation Pushdown To push the execution into relational database Steps: Push Navigation down. Cancel out Navigation and Tagger. Generating SQL stmt.

18 18 Navigation Pushdown Basically Navigation can push through all the operators until: Has dependency on its child operator. Example Rewriting rules:  (x1, path):x2[  (y1, path):y2[T]]   (y1, path):y2[  (x1, path):x2[T]] (x1 != y2)  (x1, path):x2[  (c) [T]]   (c) [  (x1, path):x2[T]]  (x1, path):x2[  [T1, T2]]   [T1,  (x1, path):x2[T2]] (if x1 in DM(T2))  (x1, path):x2[  [T1, T2]]   [  (x1, path):x2[T1], T2] (if x1 in DM(T1))

19 19 Navigation Pushdown Example  C ($b, price/text()):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg() GB(DM, min(col4):col5)   C ($b, price/text()):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg() GB(DM, min(col4):col5) 

20 20 Navigation/Tagger Cancel Out Used to simplify a composite XAT tree. Transformation Rules:  (x, /):y[T( [z] ):x[s]]  s Note: Also use type analysis for the cancel out.

21 21 View Query Example TCP/IP Illustrated 65.95 TCP/IP Illustrated 65.95 Data on the Web 34.95 Data on the Web 39.95 { for $row in distinct (DXV /book/row), return $row/title, $row/price } T( [col6] ):col5 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8

22 22 Cancel Out Example (1)  C ($b, price/text()):col4 S(“prices.xml”):R2  C (R2, /book):$b  C ($b, title):col3... T( [col6] ):col5 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8  C ($b, price/text()):col4  C (R2, /book):$b  C ($b, title):col3... T( [col6] ):R2 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8  (x, y)[op():x[s]]  op():y[s]

23 23 Cancel Out Example (2)  C ($b, price/text()):col4  C (R2, /book):$b  C ($b, title):col3... T( [col6] ):R2 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8  C ($b, price/text()):col4  C ($b, title):col3... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col7  ($row, price):col8

24 24 Cancel Out Example (3)  C ($b, price/text()):col4  C ($b, title):col3... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col7  ($row, price):col8  C ($b, price/text()):col4... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):col8

25 25 Cancel Out Example (4)  C ($b, price):temp1... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):col8  C (temp1, text()):col4... S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):temp1  C (temp1, text()):col4

26 26 SQL Generation Find a pattern in the XAT Translate that pattern into a SQL operator that will access the relational database.

27 27 SQL Generation Example... S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):temp1  C (temp1, text()):col4... SQL( select title as col3, price as temp1 from book):{col3,temp}  C (temp1, text()):col4

28 28 Where are we? XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

29 29 XAT Data Model Cleanup By Default Each operator will append one additional columns to the data model. Used to Help: Execute: used to optimize the data storage during the execution Cutting: get rid of the un-used operators in the XQuery Equations for Data Model Cleanup Only keep the columns required by ancestors. DM := (DM p – P p )  C p  (P – C)

30 30 Data Model Example for $b in document("prices.xml") /book let $prices := $b/price return $b S(“prices.xml”):R1  (R1, /book):$b Agg()  ($b,):col1  C ($b, price):$prices 1 2 3 4 5 NodeProduceConsumeDM beforeDM after 1{} {$prices, R1, $b, col1} {} 2{col1}{$b}{$prices, R1, $b, col1} {col1} 3{$prices}{$b}{$prices, R1, $b} {$b, $prices} 4{$b}{R1}{R1, $b}{$b} 5{R1}{}{R1} DM := (DM p – P p )  C p  (P – C)

31 31 Where are we? XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

32 32 XAT Cutting General Idea: Get rid of the operators that’s produce useless data. Equations: R := (R p – P)  C (P  M)  (R p  M p ) = NULL

33 33 XAT Cutting Example R := (R p – P)  C (P  M)  (R p  M p )= NULL for $b in document("prices.xml") /book let $prices := $b/price return $b S(“prices.xml”):R1  (R1, /book):$b Agg()  ($b,):col1  C ($b, price):$prices 1 2 3 4 5 NodeProduceConsumeModifie d RequiredCut? 1{} {*}{}N/A 2{col1}{$b}{}{$b}{col1} 3{$prices}{$b}{}{$b}{} 4{$b}{R1}{}{R1}{$b} 5{R1}{} {R1}

34 34 Conclusions XQuery are heavily correlated, hence need to be decorrelated for better optimization. After Decorrelation, more optimization techniques can be applied: Computation Pushdown. Data Model Cleanup. Cutting.

35 35 Future Works Write TR to formalize the XAT. Compare with ORDB, ODB, also XQA operators. Wrap Up: Finalize uncertain operators deal with collections Union, Navigate Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis Finalize the XAT Rewriting Rules for: Order Handling Update propagation. Translation from XAT back to Query Next Step: Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.


Download ppt "1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang."

Similar presentations


Ads by Google