Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XML Algebra Comparison between: XPERANTO NIAGARA.

Similar presentations


Presentation on theme: "1 XML Algebra Comparison between: XPERANTO NIAGARA."— Presentation transcript:

1 1 XML Algebra Comparison between: XPERANTO NIAGARA

2 2 Part I NIAGARA XML Query Optimization XML Algebra Data Model Operator Query Plan Equivalent Rules XPERANTO XML Query to SQL XML Algebra Data Model Operator Query Plan Composition Rules Translation Example

3 3 <!DOCTYPE invoice [ <!ELEMENT invoice (account_number, bill_period, carrier+, itemized_call*, total)> <!ATTLIST itemized_call no ID #REQUIRED date CDATA #REQUIRED number_called CDATA #REQUIRED time CDATA #REQUIRED rate (NIGHT|DAY) #REQUIRED min CDATA #REQUIRED amount CDATA #REQUIRED> ]> 555 777-3158 573 234 3 Jun 9 - Jul 8, 2000 Sprint $0.35 Example of Telephone Bill

4 4 Example XQuery User XQuery: { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) } Count number of itemized_calls in calling area 973 grouped by the calling rate.

5 5 NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

6 6 Goals Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques

7 7 Data Model A collection of bags of vertices. The vertices in the bag have no order. Example: Root invoice.xml invoice invoice.account_number Invoice-element-content carrier -element-content [Root “invoice.xml ”, invoice, invoice. account_number ]

8 8 Data Model Bag elements are reachable by path expressions. The path expression consists of two parts : An entry point A relative forward part Example : account_number:invoice

9 9 Operators Source S, Follow , Select , Join, Rename , Expose , Vertex, Group , Union , Intersection , Difference -, Cartesian Product .

10 10 Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documents S (invoice*.xml) All XML documents whose filename matches “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd

11 11 Follow operator  Input : a path expression in entry point notation Functionality : extracts vertices reachable by path expression Output : a new bag that consist of the extracted vertex + all the contents of the original bag (in care of unnesting follow)

12 12 Follow operator (Example*) Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice invoice.carrier Invoice-element-content carrier -element-content  (carrier:invoice) *Unnesting Follow {[Root invoice.xml, invoice]} {[Root invoice.xml, invoice, invoice.carrier]}

13 13 Select operator  Input : a set of bags Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator ( , ,  ), or simple qualifications ( , , , , ,  )

14 14 Select operator (Example)  invoice.carrier =Sprint Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content {[Root invoice.xml, invoice], [Root invoice.xml, invoice], ……………} {[Root invoice.xml, invoice],… }

15 15 Join operator Input : two collections of bags Functionality :Joins the two collections based on a predicate Output :the concatenation of pairs of pages that satisfy the predicate

16 16 Join operator (Example) Root invoice.xml invoice Invoice-element-content Root customer.xml customer customer-element-content account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer Invoice-element-content customer-element-content {[Root invoice.xml, invoice]}{[Root customer.xml, customer]} {[Root invoice.xml, invoice, Root customer.xml, customer]}

17 17 Expose operator  Input : a list of path expressions of vertices to be exposed Output : a set of bags that contains vertices in the parameter list with the same order

18 18 Expose operator (Example) Root invoice.xml invoice. bill_period invoice.carrier carrier-element-content bill_period -element-content  (bill_period,carrier) {[Root invoice.xml, invoice.bill_period, invoice.carrier]} Root invoice.xml invoice invoice.carrier invoice.bill_period Invoice-element-content bill_period -element-content {[Root invoice.xml, invoice, invoice.carrier, invoice.bill_period]} carrier-element-content

19 19 Vertex operator Creates the actual XML vertex that will encompass everything created by an expose operator Example : (Customer_invoice)[  ( (account)[invoice.account_number], (inv_total)[invoice.total])]

20 20 Other operators Group  : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename  : Changes the entry point annotation of the elements of a bag. Example:  (invoice.bill_period,date)

21 21 Example XQuery User XQuery: { FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate) LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate] WHERE $itemized_call/@number_called LIKE ‘973%’ RETURN $rate count($itemized_call) } Count number of itemized_calls in calling area 973 grouped by the calling rate.

22 22 Query Plan: Algebra υ(summary)[ ε(υ(rate)[rate] υ(number_of_calls)[number]) [ ρ(rate:invoice.itemized_call, rate), ρ(count(invoice.itemized_call), number) [γ(rate:invoice.itemized_call, count(invoice.itemized_call)) [σ number called:invoice.itemized_call ► ”973%” [Φ μ (invoice.itemized_call) [s(invoice.xml)]]]]]]

23 23 Equivalent Rules 14 equivalent rules so far. Definition of Auxiliary Operators for Equiv. A > B: Path expression A is a prefix of B ┴ : The null path expression A∏B : The greatest common prefix of path expressions A and B A∏B : The common prefix of path expressions A and B.

24 24 Equivalent Rules Examples Rule applications Follow ordering Φ μ (A) [Φ μ (B)] = Φ μ (B)[Φ μ (A)] iff C < A, C < B: C = A∏B, or A∏B = ┴. A B B C A... XX

25 25 Equivalent Rules Examples Rule applications Join commutability and associability (A B) C = (C B) A

26 26 Equivalent Rules Examples Rule applications Selection distribution and interchangeability σ c [A B] = σ c1 [A] σ c2 [B] where c is a conjoin of the conditions c1 and c2, each of which only refers to one of the join inputs

27 27 Equivalent Rules Examples Rule applications Elimination of unused bag elements ε(P)(J[A]) = J(ε(P[A])) iff J uses only elements exposed by P

28 28 XPERANTO Goal: XQuery  SQL References: Querying XML Views of Relational Data J. Shanmugasundaram, et. Al. Querying XML Views of Relational Data, VLDB 2001. J. Shanmugasundaram, et. Al. Efficiently Publishing Relational Data as XML Documents, VLDB 2000. J. Shanmugasundaram, Ph.D. Dissertation. July, 2001.

29 29 Query Processing Architecture RDBMS XQuery Parser Query Rewrite & View Composition Computation Pushdown Tagger Runtime XQuery Query Results XPERANTO Query Engine Tagger Graph XQGM SQL QueryTuples RDB User XML View XQuery User

30 30 Data Model Tables of A List of XML Fragments $carrier </carrier $carriers Groupby: $carrier = aggXMLFrags($carrier_entry) $carrier_entry Project: $carrier_entry = $carrier $carrier Select: $invoice_id = $id Table: Carrier $invoice_id$carrier $invoice_id$carrier $carrier_entry $carriers $carrier </carrier $carrier ……….

31 31 Operators Table, Project, Select, Join, Groupby, Orderby, Union, Unnest, View, Function - Select, Project, join, groupby, orderby and union have the same semantics as their relational counterparts. - Project : to invoke various function defined - Table/View : to refer to relational table or XML view - Unnest : to unnest XML list - Function : to invoke XQuery valued functions - Groupby : to create XML Fragments

32 32 XML Functions & Operators XML FunctionDescriptionOperators 1cr8Elem(Tag, Atts, Clist)Creates an element with tag name Tag, attribute list Atts, and contents Clist Project 2cr8AttList(A1,…..An)Creates a list of attributes from the attributes passed as parameters Project 3cr8Att(Name, Val)Creates an attribute with name Name and value ValProject 4cr8XMLFragList(C1,…Cn)Creates an XML fragment list from the content parametersProject 5aggXMLFrags©Aggregate XML function that creates an XML fragment listGroupby 6getTagName(Elem)Returns the element name of the ElemProject, Select 7getAttributes(Elem)Returns the list of attributes of ElemProject, Select 8getContents(Elem)Returns the XML fragment list of contents of ElemProject, Select 9getAttName(Att)Returns the name of attribute AttProject, Select 10getAttValueReturns the value of the attribute AttProject, Select 11isElement(E)Returns true if E is an element, returns false otherwiseSelect 12isText(T)Returns true if T is text, returns false otherwiseSelect 13Unnest(List)Superscalar function that unnests a listUnnest

33 33 Operators - Examples $elems Project: $elems = getContents($invoice) $count Groupby: $count = count($itemized_call) $elems 508-753-2352 24 july – 23 august, 2001 ………….. $count 3 $itemized_call $invoice 508-753- 2352 24 july – 23 august, 2001 …………… …………..

34 34 Operators - Examples $entries Groupby: $entries = aggXMLFrags($entry) $result Project: $result = cr8Elem(summary, Att, $entries) $entry DAY 20 NIGHT 23 $entries DAY 20 NIGHT 23 $entries DAY 20 NIGHT 23 $result DAY 20 NIGHT 23

35 35 Operator - Examples $elem Unnest: $elem = unnest($elems) $elems DAY 20 NIGHT 23 $elem DAY 20 NIGHT 23

36 36 XML Query $rate Navigate: $doc/invoice/itemized_call@rate $doc View: document(“invoice.xml”); XQGM: $itemized_call Selection: $number LIKE ‘973%’ $itemized_call Select: $rate = $irate $entry Project: $entry = $rate $count $entries Groupby: $entries = aggXMLFrags($entry) $result Project: $result = $entries $rate Select: distinct($rate) $itemized_call Navigate: $irate = $doc/invoice/itemized_call@rate$doc/invoice/itemized_call@rate $number = $doc/invoice/itemized_call@number_called $irate $count Groupby: $count = count($itemized_call) $rate Join (Correlated): $count $number User XQuery: { FOR $rate IN distinct(document(“invoice ”)/invoice/itemized_call@r ate) LET $itemized_call := document(“invoice”)/invoi ce/itemized_call[@rate=$r ate] WHERE $itemized_call/@number_calle d LIKE ‘973%’ RETURN $rate count($ite mized_call) }

37 37 Navigation in XQGM $invoice XQGM: $account_number Select: getTagName($elem)=“account_number” $elems Project: $elems = getContents($invoice) $elem Unnest: $elem = unnest($elems) $invoice $account_number Navigate: $invoice/account_number

38 38 Default XML View 1 555 777-3158 573 234 3 Jun 9 – Jun 8, 2000 $0.35 1 Sprint... idaccount_numberbill_periodtotal 1555 777-3158 573 234 3Jun 9 – Jun 8, 2000$0.35 invoice invoice_idcarrier 1Sprint carrier invoice_idnodatenumber_calledtimerateminamount 11JUN 10973 555-888810:17pmNIGHT10.05 12JUN 13973 650-222210:19amDAY10.15 13JUN 15206 365-999910:25pmNIGHT30.15 itemized_call

39 39 User Defined XML View Idaccount_numberbill_periodtotal 1555 777-3158 573 234 3Jun 9 – Jun 8, 2000$0.35 Invoice Invoice_idCarrier 1Sprint Carrier Invoice_idNoDateNumber_calledTimeRateMinAmount 11JUN 10973 555-888810:17pmNIGHT10.05 12JUN 13973 650-222210:19amDAY10.15 13JUN 15206 365-999910:25pmNIGHT30.15 Itemized_call 555 777-3158 573 234 3 Jun 9 - Jul 8, 2000 Sprint $0.35

40 40 User Defined XML View Cont. Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number $invoice/bill_period FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY (@no) $invoice/total )

41 41 XML View XQGM Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/accoun t_number $invoice/bill_period FOR $carrier in view(“default”)/carrier/r ow WHERE $carrier/invoice_id = $invoice/id RETURN $carrier FOR $itemized_call in view(“default”)/itemized _call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY (@no) $invoice/total ) $account_number Join (Correlated): $bill_period$total $doc Project: $doc = $account_number $bill_period $carriers $itemized_calls $total $carriers Groupby: $carrier = aggXMLFrags($carrier_entry) $carrier_entry Project: $carrier_entry = $carrier $carrier Select: $invoice_id = $id Table: Carrier $invoice_id$carrier Table: Invoice $id$account_number$bill_period$total $items Subquery. Table: Carrier $invoice_id$carrier $items $carriers

42 42 View Composition User Query XQGM + User View XQGM To cancel out the Navigation operators By using the composition rules cr8Elem(invoice, cr8AttList(), cr8XMLFragList( cr8Elem(account_number, cr8AttList(), cr8XMLFragList($account_number)), cr8Elem(bill_period, cr8AttList(), cr8XMLFragList($bill_period)), $carriers, $items, cr8Elem(total, cr8AttList(), cr8XMLFragList($total)) ) $account_number Select: getTagName($elem)=“account_number” $elems Project: $elems = getContents($invoice) $elem Unnest: $elem = unnest($elems) $invoice

43 43 12 Composition Rules FunctionCOMPOSES WITHREDUCTION 1getTagNamecr8Elem(Tag, Atts, Clist)Tag 2getAttributesCr8Elem(Tag, Atts, Clist)Atts 3getContentscr8Element(Tag, Atts, Clist)Clist 4getAttNamecr8Att(Name, Val)Name 5getAttValuecr8Att(Name, Val)Val 6isElementcr8Element(Tag, Atts, Clist)True 7isElementOther than cr8ElemeFalse 8isTextPCDATATrue 9isTextOther than PCDATAFalse 10UnnestaggXMLFrags(C)C 11Unnestcr8XMLFragList(C1,..., Cn)C1 U... U Cn 12Unnestcr8AttList(A1,..., An)A1 U... U An

44 44 View Composition Example $account_number Select: getTagName($elem)=“account_number” $elems Project: $elems = getContents($invoice) $elem Unnest: $elem = unnest($elems) $account_number Join (Correlated): $bill_period$total $invoice Project: $invoice = $account_number $bill_period $carriers $itemized_calls $total $items $carriers $account_number Join (Correlated):

45 45 Computation Pushdown Goal: XQGM  SQLs + Tagger Graph Step1: Query Decorrelation Correlated Join  Out Unions Reference: P. Seshadri, et. Al. “Complex Query Decorrelation”, ICDE 1996. Step2: Tagger Pull-Up XQGM  Tagger Run-Time Graph Use “Sorted Outer Union” Reference: J. Shanmugasundaram, et. Al. “Efficiently Publishing Relational Data as XML Documents”. Separation of SQL and Tagger Operations Semantically equivalent fragment by pattern.

46 46 Comparison XPERANTONIAGARA GoalXQuery  SQLXQuery  Algebra AlgebraXQGM and Tagger GraphXML Algebra Data ModelTables of a list of XML FragmentsA collection of bags of vertices Operators * 10 operators with 13 functions12 operators Variable BindingLot of temporary variablesNo variables. OrderSensitiveSemi-sensitive (missing orderby) Regular ExpressionNo Support at operator levelSupport at operator level Text-in-contextNo SupportSupport Level of abstractionFunction level (lower)Logical level (higher) Transition rulesComposition rules & (ad-hoc) 1 Semantically equivalent pattern (ad-hoc) Equivalent rules Operation HistoryNot maintainedMaintained

47 47 Conclusions and Future Work WE NEED OUR OWN ALGEBRA. More Reading David Beech, et. Al. A Formal Data Model and Algebra for XML. Mary Fernandez, et. Al. An Algebra for XML Query.


Download ppt "1 XML Algebra Comparison between: XPERANTO NIAGARA."

Similar presentations


Ads by Google