Introduction to XML Algebra

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

XML: Extensible Markup Language
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
CS CS4432: Database Systems II Logical Plan Rewriting.
Relational Algebra Dashiell Fryer. What is Relational Algebra? Relational algebra is a procedural query language. Relational algebra is a procedural query.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 XML Algebra Comparison between: XPERANTO NIAGARA.
1 COS 425: Database and Information Management Systems XML and information exchange.
CS 4432query processing1 CS4432: Database Systems II.
1 Introduction To XML Algebra Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
Cs3431 Relational Algebra : #I Based on Chapter 2.4 & 5.1.
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1 XQuery to XAT Xin Zhang. 2 Outline XAT Data Model. XAT Operator Design. XQuery Block Identification. Equivalent Rewriting Rules. Computation Pushdown.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Presenter: Miguel Garzon Torres CrUise Lab - SITE SQL Coverage Measurement for Testing Database Applications María José Suárez-Cabal University of Oviedo.
3 The Relational Model MIS 304 Winter Class Objectives That the relational database model takes a logical view of data That the relational model’s.
RELATIONAL ALGEBRA Objectives
©Silberschatz, Korth and Sudarshan4.1Database System Concepts Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries.
Relational Algebra Instructor: Mohamed Eltabakh 1.
XML-QL A Query Language for XML Charuta Nakhe
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
From Relational Algebra to SQL CS 157B Enrique Tang.
PROCESSING AND QUERYING XML 1. ROADMAP Models for Parsing XML Documents XPath Language XQuery Language XML inside DBMSs 2.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
XML and Database.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Advanced Relational Algebra & SQL (Part1 )
Operations in the Relational Model COP 4720 Lecture 8 Lecture Notes.
Relational Algebra Instructor: Mohamed Eltabakh 1 Part II.
©Silberschatz, Korth and Sudarshan3.1Database System Concepts Extended Relational-Algebra-Operations Generalized Projection Aggregate Functions Outer Join.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
XML: Extensible Markup Language
COMP3017 Advanced Databases
Relational Algebra - Part 1
Database Processing with XML
The Relational Algebra and Relational Calculus
Relational Algebra : #I
Instructor: Mohamed Eltabakh
An algebra for XML Leonidas Galanis, Stratis Viglas
2/18/2019.
Chapter 2: Intro to Relational Model
Query Optimization.
CS561-Spring 2012 WPI, Mohamed eltabakh
CS 405G: Introduction to Database Systems
Presentation transcript:

Introduction to XML Algebra CS561

Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented) data model XML format is a tree-structured hierarchical model

Why Query Algebra (for XML) ? It is common to translate a query language into an algebra. First, the algebra is used to give a semantics for the query language. Second, the algebra is used to support query optimization.

XML Algebra History Lore Algebra (August 1999) -- Stanford University IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs Niagara Algebra (2001) -- University of Wisconsin -Madison

NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier. Univ. of Wisconsin

Outline Concepts of Niagara Algebra Operations Optimization

Goals of Niagara Algebra Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques

Example: XML Source Documents Invoice.xml <Invoice_Document> <invoice No = 1> <account_number>2 </account_number> <carrier>AT&T</carrier> <total>$0.25</total> </invoice> <invoice> <account_number>1 </account_number> <carrier>Sprint</carrier> <total>$1.20</total> <total>$0.75</total> </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom </name> </customer > <account>2 </account> <name>George </name> </Customer _Document>

XML Data Model and Tree Graph Example: Invoice_Document <Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice> <number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </Invoice_Document> … Invoice Invoice number carrier number total total carrier 2 AT&T $0.25 1 $1.20 Sprint Ordered Tree Graph, Semi structured Data

XML Data Model (for Querying) SQL: relations in, relation out. Relational Algebra: relations in, relation out. XQuery: XML doc in, XML docs out XML Algebra: ??

XML Data Model [GVDNM01] Collection of bags of vertices. Vertices in a bag have no order. Example: Root invoice.xml invoice invoice.account_number < account_number > element-content </ account_number > <invoice> Invoice-element-content </invoice> [Root“invoice.xml”, invoice, invoice. account_number ]

Data Model Bag elements are reachable by path expressions. Path expression consists of two parts: An entry point A relative forward part Example: account_number:invoice

Outline Concepts of Niagara Algebra Operations Optimization

Operators Source S , Follow , Expose , Vertex , Source S , Select , Join , Rename , Group , Union , Intersection , Difference - , Cartesian Product .

Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All known XML documents S (invoice*.xml) All XML documents whose filename match “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd

Follow operator  Input : a path expression in entry point notation Functionality : extracts vertices reachable by path expression Output : a new bag that consists of the extracted vertex + all contents of original bag (in case of unnesting follow)

Follow operator (Example*) {[Root invoice.xml , invoice, invoice.carrier]} Root invoice.xml invoice invoice.carrier <carrier> carrier -element-content </carrier > <invoice> Invoice-element-content </invoice> *Unnesting Follow (carrier:invoice) Root invoice.xml invoice <invoice> Invoice-element-content </invoice> {[Root invoice.xml , invoice]}

Select operator  Input : a set of bags Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator (,,), or simple qualifications (,,,,,)

Select operator (Example) {[Root invoice.xml , invoice],… } Root invoice.xml invoice <invoice> Invoice-element-content </invoice>  invoice.carrier =Sprint Root invoice.xml invoice Root invoice.xml invoice <invoice> Invoice-element-content </invoice> <invoice> Invoice-element-content </invoice> {[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

Join operator Input: two collections of bags Functionality: Joins the two collections based on a predicate Output: the concatenation of pairs of pages that satisfy the predicate

Join operator (Example) {[Root invoice.xml , invoice, Root customer.xml , customer]} Root invoice.xml invoice Root customer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> {[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

Expose operator  Input: a list of path expressions of vertices to be exposed Output: a set of bags that contains vertices in the parameter list with the same order

Expose operator (Example) {[Root invoice.xml , invoice.bill_period, invoice.carrier]} Root invoice.xml invoice. bill_period invoice.carrier <carrier> bill_period -element-content </carrier > <invoice> carrier-element-content </invoice> (bill_period,carrier) Root invoice.xml invoice invoice.carrier invoice.bill_period <invoice> Invoice-element-content </invoice> <invoice> carrier-element-content </invoice> <carrier> bill_period -element-content </carrier > {[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

Vertex operator  Creates the actual XML vertex that will encompass everything created by an expose operator Example :  (Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

Other operators Group  : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename  : Changes entry point annotation of elements of a bag. Example: (invoice.bill_period,date)

Example: XML Source Documents Invoice.xml <Invoice_Document> <invoice> <account_number>2 </account_number> <carrier>AT&T</carrier> <total>$0.25</total> </invoice> <account_number>1 </account_number> <carrier>Sprint</carrier> <total>$1.20</total> <total>$0.75</total> <auditor> maria </auditor> </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom </name> </customer > <account>2 </account> <name>George </name> </Customer _Document>

Xquery Example List account number, customer name, and invoice total for all invoices that have carrier = “Sprint”. FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN <Sprint_invoices> $i/account_number, $c/name, $i/total </Sprint_invoices>

Example: Xquery output <Sprint_Invoice> <account_number>1 </account_number> <name>Tom </name> <total>$1.20</total> </Sprint_Invoice >

Algebra Tree Execution Account_number name total Expose (*.account_number , *.name, *.total ) invoice(2) customer(1) Join (*.invoice.account_number=*.customer.account) invoice (2) Select (carrier= “Sprint” ) Invoice (1) invoice (2) invoice (3) customer(1) customer (2) Follow (*.invoice) Follow (*.customer) Source (Invoices.xml) Source (cutomers.xml)

Outline Concepts of Niagara Algebra Operations Optimization

Optimization with Niagara Optimizer based on Niagara algebra: Use the operation more efficiently Produce simpler expressions by combining operations

Language Convention A and B are path expressions A< B -- Path Expression A is prefix of B AnB --- Common prefix of path A and B AńB --- Greatest common prefix of path A and B ┴ --- Null path Expression

Heuristics using Rewrite Rules Allow optimization based on path selectivity When applying un-nesting with operation Φμ

Interchangeability of Follow operation Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)] TRUE or FALSE? TRUE when exists C such that C < A && C < B and C = AńB Or AnB = ┴

Application of Rule on Invoice Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] == Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] ? TRUE or FALSE?

Application of Rule on Invoice Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] = Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] TRUE because both share common prefix “invoice”. Case AńB = invoice

Benefit of Rule Application NOTE: Assume acc_Num is required for each invoice element, while carrier is not THEN: Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] == Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] Then what algebra tree do we prefer?

Discussion Reduction of Input Size on first Sub-operation: Φμ(carrier:invoice)  vs Φμ(acc_Num:invoice) (:

Can we apply the rule below? Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]

Example “acc_Num:invoice” and “acc_Num:customer” are two totally different paths Case is: AnB = ┴ So yes, rule is valid.

Summary XML Algebra Operations Optimization