Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to XML Algebra

Similar presentations


Presentation on theme: "Introduction to XML Algebra"— Presentation transcript:

1 Introduction to XML Algebra
CS561

2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented) data model XML format is a tree-structured hierarchical model

3 Why Query Algebra (for XML) ?
It is common to translate a query language into an algebra. First, the algebra is used to give a semantics for the query language. Second, the algebra is used to support query optimization.

4 XML Algebra History Lore Algebra (August 1999) -- Stanford University
IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs Niagara Algebra (2001) -- University of Wisconsin -Madison

5 NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier. Univ. of Wisconsin

6 Outline Concepts of Niagara Algebra Operations Optimization

7 Goals of Niagara Algebra
Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques

8 Example: XML Source Documents
Invoice.xml <Invoice_Document> <invoice No = 1> <account_number>2 </account_number> <carrier>AT&T</carrier> <total>$0.25</total> </invoice> <invoice> <account_number>1 </account_number> <carrier>Sprint</carrier> <total>$1.20</total> <total>$0.75</total> </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom </name> </customer > <account>2 </account> <name>George </name> </Customer _Document>

9 XML Data Model and Tree Graph
Example: Invoice_Document <Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice> <number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </Invoice_Document> Invoice Invoice number carrier number total total carrier 2 AT&T $0.25 1 $1.20 Sprint Ordered Tree Graph, Semi structured Data

10 XML Data Model (for Querying)
SQL: relations in, relation out. Relational Algebra: relations in, relation out. XQuery: XML doc in, XML docs out XML Algebra: ??

11 XML Data Model [GVDNM01] Collection of bags of vertices.
Vertices in a bag have no order. Example: Root invoice.xml invoice invoice.account_number < account_number > element-content </ account_number > <invoice> Invoice-element-content </invoice> [Root“invoice.xml”, invoice, invoice. account_number ]

12 Data Model Bag elements are reachable by path expressions.
Path expression consists of two parts: An entry point A relative forward part Example: account_number:invoice

13 Outline Concepts of Niagara Algebra Operations Optimization

14 Operators Source S , Follow , Expose , Vertex ,
Source S , Select , Join , Rename , Group , Union , Intersection , Difference - , Cartesian Product .

15 Source Operator S Input : a list of documents
Output :a collection of singleton bags Examples : S (*) All known XML documents S (invoice*.xml) All XML documents whose filename match “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd

16 Follow operator  Input : a path expression in entry point notation
Functionality : extracts vertices reachable by path expression Output : a new bag that consists of the extracted vertex + all contents of original bag (in case of unnesting follow)

17 Follow operator (Example*)
{[Root invoice.xml , invoice, invoice.carrier]} Root invoice.xml invoice invoice.carrier <carrier> carrier -element-content </carrier > <invoice> Invoice-element-content </invoice> *Unnesting Follow (carrier:invoice) Root invoice.xml invoice <invoice> Invoice-element-content </invoice> {[Root invoice.xml , invoice]}

18 Select operator  Input : a set of bags
Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator (,,), or simple qualifications (,,,,,)

19 Select operator (Example)
{[Root invoice.xml , invoice],… } Root invoice.xml invoice <invoice> Invoice-element-content </invoice>  invoice.carrier =Sprint Root invoice.xml invoice Root invoice.xml invoice <invoice> Invoice-element-content </invoice> <invoice> Invoice-element-content </invoice> {[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

20 Join operator Input: two collections of bags
Functionality: Joins the two collections based on a predicate Output: the concatenation of pairs of pages that satisfy the predicate

21 Join operator (Example)
{[Root invoice.xml , invoice, Root customer.xml , customer]} Root invoice.xml invoice Root customer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> {[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

22 Expose operator  Input: a list of path expressions of vertices to be exposed Output: a set of bags that contains vertices in the parameter list with the same order

23 Expose operator (Example)
{[Root invoice.xml , invoice.bill_period, invoice.carrier]} Root invoice.xml invoice. bill_period invoice.carrier <carrier> bill_period -element-content </carrier > <invoice> carrier-element-content </invoice> (bill_period,carrier) Root invoice.xml invoice invoice.carrier invoice.bill_period <invoice> Invoice-element-content </invoice> <invoice> carrier-element-content </invoice> <carrier> bill_period -element-content </carrier > {[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

24 Vertex operator  Creates the actual XML vertex that will encompass everything created by an expose operator Example :  (Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

25 Other operators Group  : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename  : Changes entry point annotation of elements of a bag. Example: (invoice.bill_period,date)

26 Example: XML Source Documents
Invoice.xml <Invoice_Document> <invoice> <account_number>2 </account_number> <carrier>AT&T</carrier> <total>$0.25</total> </invoice> <account_number>1 </account_number> <carrier>Sprint</carrier> <total>$1.20</total> <total>$0.75</total> <auditor> maria </auditor> </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom </name> </customer > <account>2 </account> <name>George </name> </Customer _Document>

27 Xquery Example List account number, customer name, and invoice total for all invoices that have carrier = “Sprint”. FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN <Sprint_invoices> $i/account_number, $c/name, $i/total </Sprint_invoices>

28 Example: Xquery output
<Sprint_Invoice> <account_number>1 </account_number> <name>Tom </name> <total>$1.20</total> </Sprint_Invoice >

29 Algebra Tree Execution
Account_number name total Expose (*.account_number , *.name, *.total ) invoice(2) customer(1) Join (*.invoice.account_number=*.customer.account) invoice (2) Select (carrier= “Sprint” ) Invoice (1) invoice (2) invoice (3) customer(1) customer (2) Follow (*.invoice) Follow (*.customer) Source (Invoices.xml) Source (cutomers.xml)

30 Outline Concepts of Niagara Algebra Operations Optimization

31 Optimization with Niagara
Optimizer based on Niagara algebra: Use the operation more efficiently Produce simpler expressions by combining operations

32 Language Convention A and B are path expressions
A< B -- Path Expression A is prefix of B AnB  Common prefix of path A and B AńB  Greatest common prefix of path A and B ┴  Null path Expression

33 Heuristics using Rewrite Rules
Allow optimization based on path selectivity When applying un-nesting with operation Φμ

34 Interchangeability of Follow operation
Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)] TRUE or FALSE? TRUE when exists C such that C < A && C < B and C = AńB Or AnB = ┴

35 Application of Rule on Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] == Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] ? TRUE or FALSE?

36 Application of Rule on Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] = Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] TRUE because both share common prefix “invoice”. Case AńB = invoice

37 Benefit of Rule Application
NOTE: Assume acc_Num is required for each invoice element, while carrier is not THEN: Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] == Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] Then what algebra tree do we prefer?

38 Discussion Reduction of Input Size on first Sub-operation:
Φμ(carrier:invoice)  vs Φμ(acc_Num:invoice) (:

39 Can we apply the rule below?
Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]

40 Example “acc_Num:invoice” and “acc_Num:customer”
are two totally different paths Case is: AnB = ┴ So yes, rule is valid.

41 Summary XML Algebra Operations Optimization


Download ppt "Introduction to XML Algebra"

Similar presentations


Ads by Google