Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke.

Similar presentations


Presentation on theme: "1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke."— Presentation transcript:

1 1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke

2 2 Outline Merged algebra proposed based on Niagara XPERANTO One thorough example of XQuery  SQL

3 3 Data Model An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain Every column binds to one variable. The domain can be: SQL domains. XML Fragment. Can be a list of XML elements. Comparison are done by values

4 4 Data Model Examples Table of XML Fragments. Explicit Naming E.g. variable bindings Implicit Naming E.g. XPath notations. Reduce complexity of many internal variables. $carrier </carrier invoice_idcarrier carrier_entry carriers $carrier </carrier $carrier ………. //invoice/invoice/account_number $rate

5 5 Naming of Columns Implicit: SQL operators Navigate Explicit ( “name”): Variable binding: Holding a set of values. Variable name ($name) is name of a column Rename Distinguish in one operator where, same “names” from different sources Abbreviate a very long “name”. Create a new name for creation operators Need to used with those operators. E.g. Tagger

6 6 Operators SQL like (9): Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp. XML like (4): Tagger, Navigate, is(Element, Text), Aggregate. Special: SQL, Function, Source, Name, FOR

7 7 SQL like Operators (9) NiagaraXPERANTO ProjectExposeProject Select Theta JoinJoinTheta Join Outer JoinN/AOuter Join Semi JoinN/A GroupbyGroupGroupby OrderbyN/AOrderby Union Outer UnionUnionOuter Union COpN/ACorrelated Join

8 8 XML like Operators NiagaraXPERANTO Tagger* (pattern) VertexProject: cr8(Elem, AttList, Att, XMLFragList), Navigate (from, path) FollowProject: get(TagName, Attributes, Contents, AttName, AttValue), Unnest IsN/ASelect: is(Element, Text), AggregateGroupAggXMLFrags

9 9 Special Operators NiagaraXPERANTODescription SQLN/AInputDenote a SQL query. FunctionN/AFunctionUsed to represent recursive query Source Table, ViewIdentify a data source. NameRenameN/ANaming of columns. FORN/A FOR iteration.

10 10 Operator Specification Description Input Specification. Output Specification. Logic description. Illustrative Example

11 11 Naming Operator Syntax: Name(“from_name”, “to_name”) Simplified Syntax: to_name := from_name

12 12 Steps in Translation XQuery  XML Algebra Tree User View  XML Algebra Tree View Composition Computation Pushdown Optimization

13 13 <!DOCTYPE invoice [ <!ELEMENT invoice (account_number, bill_period, carrier+, itemized_call*, total)> <!ATTLIST itemized_call no ID #REQUIRED date CDATA #REQUIRED number_called CDATA #REQUIRED time CDATA #REQUIRED rate (NIGHT|DAY) #REQUIRED min CDATA #REQUIRED amount CDATA #REQUIRED> ]> 555 777-3158 573 234 3 Jun 9 - Jul 8, 2000 Sprint $0.35 Example of Telephone Bill

14 14 Example XQuery User XQuery: { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) } Count number of itemized_calls in calling area 973 grouped by the calling rate.

15 15 XQuery  XML Algebra Tree Divide into query blocks Convert each query block into XML Algebra Tree (XAT). Identify Correlated Operators Combine into one XML Algebra Tree. Query decorrelation

16 16 Query Blocks User XQuery: { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) } B1: Construct summary from the result from B2 B2: Get all the distinct rate and iterate through it. B1 B2 B3 B3: Count itemized call for a given rate. The block identification is arbitrary (wrong).

17 17 XAT of B1 B1 B2 XAT: Tagger( [V1] ) B2 [V2] it is a name instead of a part of pattern. Name(“Tagger( [V1] )”, “V2”)

18 18 XAT of B2 B3 { FOR $rate IN distinct(document(“invoice”) /invoice/itemized_call@rate) } B3 XAT: Select(distinct(“invoice/itemized_call/@rate:/”)) B3 Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call/@rate) Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”) FOR($rate) Aggregate

19 19 XAT of B3 B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Navigate(“$itemized_call”, @rate) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) B2 Select(“@rate:$itemized_call” = “$rate”) Name(“invoice/itemized_call:/”, “$itemized_call”)

20 20 XAT of B3 (Cont.) B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Navigate(“$itemized_call”, @number_called) Select(“@number_called:$itemized_call” like ‘973%’)

21 21 XAT of B3 (Cont.) B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate] WHERE $itemized_call /@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Tagger( [$rate] [count($itemized_call)] ) Select(count(“$itemized_call”)) B2 Name(“Tagger( [$rate] [count($itemized_call)] )”, “V1”)

22 22 Put it Together Select(count(“$itemized_call”)) Navigate(“$itemized_call”, @number_called) Select(“@number_called:$itemized_call” like ‘973%’) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) Select(“@rate:$itemized_call” = “$rate”) Name(“Tagger( [V1] )”, “V2”) Select(distinct(“invoice/itemized_call/@rate:/”)) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call/@rate) Name(“invoice/itemized_call:/”, “$itemized_call”) Navigate(“$itemized_call”, @rate) B1 B2 B3 FOR($rate) Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”) Aggregate() Tagger( [$rate] [count($itemized_call)] ) Name(“Tagger( [$rate] [count($itemized_call)] )”, “V1”) Tagger( [V1] )

23 23 Syntax Suger Select(count(“$itemized_call”)) Navigate(“$itemized_call”, @number_called) Select(“@number_called:$itemized_call” like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) Select(“@rate:$itemized_call” = “$rate”) V2 := Tagger( [V1] ) $rate := Select(distinct(“invoice/itemized_call/@rate:/”)) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call/@rate) Navigate(“$itemized_call”, @rate) B1 B2 B3 FOR($rate) Aggregate() V1:=Tagger( [$rate] [count($itemized_call)] )

24 24 Query Decorrelation for COp Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated Operator[A],B]]]  Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]] For example: Correlated Join  Outer Join with Groupby with Cartesian

25 25 Query Decorrelation for FOR Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[FOR(CB)[Op2[Correlated Operator[A],B]]]  Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]] Differences: SQL Decorrelation: Return Outer Query XQuery Decorrelation: Return Inner Query CO: Return both Outer/Inner Query

26 26 FOR Decorrelation Example Source(“invoice.xml”) Select(“@rate:$itemized_call” = “$rate”) …1 Source(“invoice.xml”) …3 B2 B3 FOR($rate) …2 Source(“invoice.xml”) Select(“@rate:$itemized_call” = “$rate”) Groupby(“$ratel”, ) Cartesian Source(“invoice.xml”) …3 B1 B2 B3 Aggregate …2 …1

27 27 Default XML View 1 555 777-3158 573 234 3 Jun 9 – Jun 8, 2000 $0.35 1 Sprint... idaccount_numberbill_periodtotal 1555 777-3158 573 234 3Jun 9 – Jun 8, 2000$0.35 invoice invoice_idcarrier 1Sprint carrier invoice_idnodatenumber_calledtimerateminamount 11JUN 10973 555-888810:17pmNIGHT10.05 12JUN 13973 650-222210:19amDAY10.15 13JUN 15206 365-999910:25pmNIGHT30.15 itemized_call

28 28 User Defined XML View 555 777-3158 573 234 3 Jun 9 - Jul 8, 2000 Sprint $0.35 1 555 777-3158 573 234 3 Jun 9 – Jun 8, 2000 $0.35 1 Sprint 1 … …

29 29 User Defined XML View Cont. Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number/text() $invoice/bill_period/text() FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier/carrier/text() FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY (@no) $invoice/total/text() )

30 30 User Defined XML View Block Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number/text() $invoice/bill_period/text() FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier/carrier/text() FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY (@no) $invoice/total/text() ) B4 B5 B6

31 31 XML View XAT V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text() …[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) FOR($invoice/id) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Select(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) B5 Navigate($itemized_call, amount/text()) …

32 32 3-Way Correlation …2 Source(“invoice.xml”) B4 FOR($invoice/id) …1 B5B6

33 33 3-Way Decorrelation …2 Source(“default.xml”) B4 JOIN($invoice/id) …1 B5 with CartesianB6 with Cartesian GB($invoice/id, …) …2 Source(“default.xml”)

34 34 View XAT After Decorrelation V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text() …[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Join($invoice/id) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) B5 Navigate($itemized_call, amount/text()) … Groupby($invoice/id…) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Navigate(“$invoice”, id)

35 35 View Composition Input: User Query XAT + User View XAT Output: Simplified composite XAT Approach: XAT Cutting: Remove un-referenced columns and operators. Pushdown Navigation By using the commutative rules Cancel out the navigation operators By using the composition rules

36 36 XAT Cutting Cut Query Blocks User query only require itemized_call. B5 is cut, Invoice is cut B4 is simplified. B6 is simplified. Cut Columns User query only used itemized_call@rate.

37 37 View XAT After B5 is Cut. V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text()</bill_period[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, amount/text()) …

38 38 View After Columns are Cut. V4 := Tagger( [V3] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, rate/text())

39 39 Navigation Cancel Out Navigation Pushdown Based on some transformation rules. E.g. commutative of navigation and other operators. Navigation + Tagger Cancel Out Composition Rules. The cancellation result is “renaming”

40 40 Query XAT Navi. Pushdown Select(count(“$itemized_call”)) Navigate(“$itemized_call”, @number_called) Select(“@number_called:$itemized_call” like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) Select(“@rate:$itemized_call” = “$rate”) Navigate(“$itemized_call”, @rate) B3 V1:=Tagger( [$rate] [count($itemized_call)] ) Select(count(“$itemized_call”)) Navigate(“$itemized_call”, @number_called) Select(“@number_called:$itemized_call” like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) Select(“@rate:$itemized_call” = “$rate”) Navigate(“$itemized_call”, @rate) V1:=Tagger( [$rate] [count($itemized_call)] )

41 41 Navi. Tagger Cancel Out Navigate(“$itemized_call”, @number_called) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) Navigate(“$itemized_call”, @rate) B3 …1 V4 := Tagger( [V3] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) …2

42 42 The Result of Cancel Out …1 $itemized_call@rate := Navigate($itemized_call, rate/text()) $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) …2

43 43 Computation Pushdown Goal: XAT  SQL operators + XML operators Step 0: Navigation Pushdown. Step 1: XML Default View  SQL Operators Renaming columns Step 2: SQL Computation Pushdown. By commutative and composition rules. E.g: predicates pushdown.

44 44 Navigation Pushdown. Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) $itemized_call@rate := Navigate($itemized_call, rate/text()) $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) $itemized_call@rate := Navigate($itemized_call, rate/text()) $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”)

45 45 XML Default View  SQL Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) $itemized_call@rate := Navigate($itemized_call, rate/text()) $Itemized_call@number_called := Navigate($itemized_call, number_called/text()) Source(“itemized_call”) Project(rate, number_called) $itemized_call@rate := rate $Itemized_call@number_called := number_called … …

46 46 Computation Pushdown A SQL Block Select(count(“$itemized_call”)) Select(“@number_called:$itemized_call” like ‘973%’) Select(“@rate:$itemized_call” = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] ) B3 Select(“@number_called:$itemized_call” like ‘973%’) Select(count(“$itemized_call”)) Select(“@rate:$itemized_call” = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] ) A SQL Block

47 47 Result of the Transformation Tagger( [V1] ) V1 := Aggregate Tagger( [rate] [count(*)] ) SQL: SELECT rate, count(*) FROM itemized_call, invoice WHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_id GROUPBY rate

48 48 Optimization Efficient Publishing XML Views Sorted Outer Union. Special Tagger implementation A lot More!

49 49 Summary XQuery  XAT Query Block Identification Query Decorrelation View Composition XAT Cutting Navigation Pushdown Navigation Cancel Out Computation Pushdown Navigation Pushdown XML Default View  SQL Operators Computation Pushdown Optimization


Download ppt "1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke."

Similar presentations


Ads by Google