Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

Similar presentations


Presentation on theme: "1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane."— Presentation transcript:

1 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

2 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set- oriented) data model XML format is a tree-structured hierarchical model

3 3 Why XML Algebra? It is common to translate a query language into an algebra. First, the algebra is used to give a semantics for the query language. Second, the algebra is used to support query optimization.

4 4 XML Algebra History Lore Algebra (August 1999) -- Stanford University IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs Niagara Algebra (2001) -- University of Wisconsin -Madison

5 5 NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier. Univ. of Wisconsin

6 6 Outline Concepts of Niagara Algebra Operations Optimization

7 7 Goals of Niagara Algebra Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques

8 8 Example: XML Source Documents Invoice.xml 2 AT&T $0.25 1 Sprint $1.20 1 AT&T $0.75 Customer.xml 1 Tom 2 George

9 9 XML Data Model and Tree Graph Example: Invoice_Document Invoice … number carrier total number carrier total 2 AT&T$0.251 Sprint $1.20 2 Sprint $0.25 1 Sprint $1.20 Ordered Tree Graph, Semi structured Data

10 10 XML Data Model [GVDNM01] Collection of bags of vertices. Vertices in a bag have no order. Example: Root invoice.xml invoice invoice.account_number Invoice-element-content element-content [Root “invoice.xml ”, invoice, invoice. account_number ]

11 11 Data Model Bag elements are reachable by path expressions. Path expression consists of two parts: An entry point A relative forward part Example: account_number:invoice

12 12 Operators Source S, Follow , Select , Join, Rename , Expose , Vertex, Group , Union , Intersection , Difference -, Cartesian Product .

13 13 Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documents S (invoice*.xml) All XML documents whose filename match “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd

14 14 Follow operator  Input : a path expression in entry point notation Functionality : extracts vertices reachable by path expression Output : a new bag that consists of the extracted vertex + all contents of original bag (in case of unnesting follow)

15 15 Follow operator (Example*) Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice invoice.carrier Invoice-element-content carrier -element-content  (carrier:invoice) *Unnesting Follow {[Root invoice.xml, invoice]} {[Root invoice.xml, invoice, invoice.carrier]}

16 16 Select operator  Input : a set of bags Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator (,,), or simple qualifications (,,,,,)

17 17 Select operator (Example)  invoice.carrier =Sprint Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content {[Root invoice.xml, invoice], [Root invoice.xml, invoice], ……………} {[Root invoice.xml, invoice],… }

18 18 Join operator Input: two collections of bags Functionality: Joins the two collections based on a predicate Output: the concatenation of pairs of pages that satisfy the predicate

19 19 Join operator (Example) Root invoice.xml invoice Invoice-element-content Root customer.xml customer customer-element-content account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer Invoice-element-content customer-element-content {[Root invoice.xml, invoice]}{[Root customer.xml, customer]} {[Root invoice.xml, invoice, Root customer.xml, customer]}

20 20 Expose operator  Input: a list of path expressions of vertices to be exposed Output: a set of bags that contains vertices in the parameter list with the same order

21 21 Expose operator (Example) Root invoice.xml invoice. bill_period invoice.carrier carrier-element-content bill_period -element-content  (bill_period,carrier) {[Root invoice.xml, invoice.bill_period, invoice.carrier]} Root invoice.xml invoice invoice.carrier invoice.bill_period Invoice-element-content bill_period -element-content {[Root invoice.xml, invoice, invoice.carrier, invoice.bill_period]} carrier-element-content

22 22 Vertex operator Creates the actual XML vertex that will encompass everything created by an expose operator Example : (Customer_invoice)[  ( (account)[invoice.account_number], (inv_total)[invoice.total])]

23 23 Other operators Group  : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename  : Changes entry point annotation of elements of a bag. Example: (invoice.bill_period,date)

24 24 Example: XML Source Documents Invoice.xml 2 AT&T $0.25 1 Sprint $1.20 1 $0.75 maria Customer.xml 1 Tom 2 George

25 25 Xquery Example List account number, customer name, and invoice total for all invoices that has carrier = “Sprint”. FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN $i/account_number, $c/name, $i/total

26 26 Example: Xquery output 1 Tom $1.20

27 27 Algebra Tree Execution customer (2)customer(1)Invoice (1)invoice (2)invoice (3) Source (Invoices.xml)Source (cutomers.xml) Follow (*.invoice)Follow (*.customer) Select (carrier= “Sprint” ) invoice (2) Join (*.invoice.account_number=*.customer.account) invoice(2) customer(1) Expose (*.account_number, *.name, *.total ) Account_number name total

28 28 Optimization with Niagara Optimizer based on Niagara algebra: Use the operation more efficiently Produce simpler expressions by combining operations

29 29 Language Convention A and B are path expressions A< B --  Path Expression A is prefix of B AnB ---  Common prefix of path A and B AńB ---  Greatest common of path A and B ┴ ---  Null path Expression

30 30 Heuristics using Rewrite Rules Allow optimization based on path selectivity When applying un-nesting following operation Φ μ

31 31 Φ μ (A) [Φ μ (B)]=Φ μ (B)[Φ μ (A)] TRUE when exists C such that C < A && C < B and C = AńB Or AnB = ┴ Interchangeability of Follow operation

32 32 Application of Rule on Invoice Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] * =?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] **

33 33 Application of Rule on Invoice Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] ?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] Equivalent because both share the common prefix “invoice”. Case AńB = invoice

34 34 Benefit of Rule Application NOTE: let us assume that acc_Num is required for each invoice element, while carrier is not required for invoice element THEN: Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] ?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] Then what algebra tree do we prefer? Φ μ (acc_Num:invoice)[Φ μ (acc_Num:customer)] make more sense than ** Why?

35 35 Discussion Reduction of Input Size on first Sub-operation: Φ μ (carrier:invoice)

36 36 Should we/can we apply the rule below? Φ μ (acc_Num:invoice)[Φ μ (acc_Num:Customer)]

37 37 “acc_Num:invoice” and “acc_Num:customer” are two totally different paths Case is: AnB = ┴ So yes, rule is valid.


Download ppt "1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane."

Similar presentations


Ads by Google