1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
What is a Database By: Cristian Dubon.
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
11/08/2002WIDM20021 An Algebraic Approach For Incremental Maintenance of Materialized XQuery Views Maged EL-Sayed, Ling Wang, Luping Ding, and Elke A.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
Ling Wang, Mukesh Mulchandani Advisor: Elke A. Rundensteiner Rainbow Research group, DSRG, WPI Updating XQuery Views over Relational Data.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
ORDB Implementation Discussion. From RDB to ORDB Issues to address when adding OO extensions to DBMS system.
VOX O rder-sensitive V iew Maintenance of Materialized X Query Views ER 2003 October 14 th 2003 Katica Dimitrova*, Maged El-Sayed and Elke Rundensteiner.
2003. DSRG, Worcester Polytechnic Institute1 Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects WPI DSRG GROUP.
Database Systems and XML David Wu CS 632 April 23, 2001.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
April 4, 2002 Updating XML Views of Relational Data 1 Master’s Thesis Update Talk For Mukesh Mulchandani Advisor : Prof. Elke Rundensteiner Reader : Prof.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
1 XQuery to XAT Xin Zhang. 2 Outline XAT Data Model. XAT Operator Design. XQuery Block Identification. Equivalent Rewriting Rules. Computation Pushdown.
1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
Chapter 4 The Relational Model.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
2005rel-xml-iii1  View forests and query composition The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)
Data Access Patterns Some of the problems with data access from OO programs: 1.Data source and OO program use different data modelling concepts 2.Decoupling.
IFS180 Intro. to Data Management Chapter 9 – Outer Joins.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Efficient XSLT Processing in Relational Database System Zhen Hua Liu Anguel Novoselsky Oracle Corporation VLDB 2006.
Database Systems Part VII: XML Querying Software School of Hunan University
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
1 Final Review Tuesday, March 6, The Final Date: Tuesday, March 13, 2007 Time: 6:30 - 8:30 Room: EE 037 You must come to campus Open book exam.
CS4432: Database Systems II Query Processing- Part 2.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Database Overview What is a database? What types of databases are there? How are databases more powerful than spreadsheets?
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
IFS180 Intro. to Data Management Chapter 10 - Unions.
CPSC-608 Database Systems
Database.
SQL Fundamentals in Three Hours
Query Optimization.
CPSC-608 Database Systems
Presentation transcript:

1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

2 Outline XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

3 XAT Decorrelation XQuery is Correlated Query Decorrelation is required for Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

4 Three kinds of Decorrelation Simple Decorrelation No Additional sources No Aggregate Functions Complex Decorrelation with Additional Sources Complex Decorrelation with Aggregate Functions

5 TCP/IP Illustrated TCP/IP Illustrated Data on the Web Data on the Web Example* of XML Use Cases.

6 Simple Query Example T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t] ):col1 { for $t in distinct (document("prices.xml") /book/title) return $t } In the document "prices.xml", find the book title.

7 Simple Decorrelation Linear the Tree: T[FOR(CB, T2[])[T1[S1]]]  T[T2[T1[S1]]] T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t] ):col1 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 Agg() T ( [$t] ):col1

8 Is Simple Decorrelation Right? Every operator, except Groupby, has the semantic of “for each” tuple in the input table. Hence, the FOR operator can be omitted in the simple decorrelation scenario.

9 Two types of Navigates Navigate Unnesting:  U Unnesting the parent-children relationship, and duplicates the parent values for each child. Navigate Collection:  C Nesting the parent-children relationship, create a collection of children, but keep the single parent.

10 Where to use two types Navigate Unnesting:  U FOR binding. Navigate Collection:  C LET binding.

11 Complex Query Example  c ($b, price):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col4] ):col1 { for $t in distinct (document("prices.xml") /book/title), let $b := document( “ prices.xml") /book [title = $t] return $t, $b/price } In the document "prices.xml", find the book title and its prices. S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3

12 Complex Decorrelation with Additional Source  : T[FOR(CB, T2[S2])[T1[S1]]]  T[T2[  [T1[S1],S2]]]  c ($b, price):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3  C ($b, price):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg()

13 Full Query Example  c ($b, price/text()):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col5] ):col1 { for $t in distinct (document("prices.xml") /book/title), let $b := document( “ prices.xml") /book [title = $t] return $t, min($b/price/text()) } In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element. S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3 min(col4):col5

14 Complex Query Decorrelation with one Aggregation Function T[FOR(CB, T2[Agg(T3[])])[T1[S1]]]  T[  (DM(T1))[T1,T2[  (DM(T1),Agg(T3[  [Distinct(T1[S1]), S2]))]]] DM(T1) is data model computed from T1. S2 Agg() T1 S1 T3 FOR($rate) T2 T S1 Groupby(DM(T1), Agg())  S2 T3 T T2 T1  Distinct

15 The Query after Decorrelation  c ($b, price/text()):col4 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2 FOR($t) Agg() T ( [$t], [col5] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  c ($b, title):col3 min(col4):col5  C ($b, price/text()):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg() GB(DM, min(col4):col5) 

16 Where are we? XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

17 XAT Computation Pushdown To push the execution into relational database Steps: Push Navigation down. Cancel out Navigation and Tagger. Generating SQL stmt.

18 Navigation Pushdown Basically Navigation can push through all the operators until: Has dependency on its child operator. Example Rewriting rules:  (x1, path):x2[  (y1, path):y2[T]]   (y1, path):y2[  (x1, path):x2[T]] (x1 != y2)  (x1, path):x2[  (c) [T]]   (c) [  (x1, path):x2[T]]  (x1, path):x2[  [T1, T2]]   [T1,  (x1, path):x2[T2]] (if x1 in DM(T2))  (x1, path):x2[  [T1, T2]]   [  (x1, path):x2[T1], T2] (if x1 in DM(T1))

19 Navigation Pushdown Example  C ($b, price/text()):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg() GB(DM, min(col4):col5)   C ($b, price/text()):col4 T ( [$t], [col4] ):col1 S(“prices.xml”):R2  C (R2, /book):$b  (col3=$t)  C ($b, title):col3 T( [col1] ):col0 distinct(col2):$t S(“prices.xml”):R1  (R1, /book/title):col2  Agg() GB(DM, min(col4):col5) 

20 Navigation/Tagger Cancel Out Used to simplify a composite XAT tree. Transformation Rules:  (x, /):y[T( [z] ):x[s]]  s Note: Also use type analysis for the cancel out.

21 View Query Example TCP/IP Illustrated TCP/IP Illustrated Data on the Web Data on the Web { for $row in distinct (DXV /book/row), return $row/title, $row/price } T( [col6] ):col5 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8

22 Cancel Out Example (1)  C ($b, price/text()):col4 S(“prices.xml”):R2  C (R2, /book):$b  C ($b, title):col3... T( [col6] ):col5 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8  C ($b, price/text()):col4  C (R2, /book):$b  C ($b, title):col3... T( [col6] ):R2 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8  (x, y)[op():x[s]]  op():y[s]

23 Cancel Out Example (2)  C ($b, price/text()):col4  C (R2, /book):$b  C ($b, title):col3... T( [col6] ):R2 T( [col7],[col8] ):col6 S(DXV):R3  (R3, /book/row):$row Agg()  ($row, title):col7  ($row, price):col8  C ($b, price/text()):col4  C ($b, title):col3... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col7  ($row, price):col8

24 Cancel Out Example (3)  C ($b, price/text()):col4  C ($b, title):col3... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col7  ($row, price):col8  C ($b, price/text()):col4... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):col8

25 Cancel Out Example (4)  C ($b, price):temp1... T( [col7],[col8] ):$b S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):col8  C (temp1, text()):col4... S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):temp1  C (temp1, text()):col4

26 SQL Generation Find a pattern in the XAT Translate that pattern into a SQL operator that will access the relational database.

27 SQL Generation Example... S(DXV):R3  (R3, /book/row):$row  ($row, title):col3  ($row, price):temp1  C (temp1, text()):col4... SQL( select title as col3, price as temp1 from book):{col3,temp}  C (temp1, text()):col4

28 Where are we? XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

29 XAT Data Model Cleanup By Default Each operator will append one additional columns to the data model. Used to Help: Execute: used to optimize the data storage during the execution Cutting: get rid of the un-used operators in the XQuery Equations for Data Model Cleanup Only keep the columns required by ancestors. DM := (DM p – P p )  C p  (P – C)

30 Data Model Example for $b in document("prices.xml") /book let $prices := $b/price return $b S(“prices.xml”):R1  (R1, /book):$b Agg()  ($b,):col1  C ($b, price):$prices NodeProduceConsumeDM beforeDM after 1{} {$prices, R1, $b, col1} {} 2{col1}{$b}{$prices, R1, $b, col1} {col1} 3{$prices}{$b}{$prices, R1, $b} {$b, $prices} 4{$b}{R1}{R1, $b}{$b} 5{R1}{}{R1} DM := (DM p – P p )  C p  (P – C)

31 Where are we? XAT Decorrelation. Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting. Conclusion & Future Works.

32 XAT Cutting General Idea: Get rid of the operators that’s produce useless data. Equations: R := (R p – P)  C (P  M)  (R p  M p ) = NULL

33 XAT Cutting Example R := (R p – P)  C (P  M)  (R p  M p )= NULL for $b in document("prices.xml") /book let $prices := $b/price return $b S(“prices.xml”):R1  (R1, /book):$b Agg()  ($b,):col1  C ($b, price):$prices NodeProduceConsumeModifie d RequiredCut? 1{} {*}{}N/A 2{col1}{$b}{}{$b}{col1} 3{$prices}{$b}{}{$b}{} 4{$b}{R1}{}{R1}{$b} 5{R1}{} {R1}

34 Conclusions XQuery are heavily correlated, hence need to be decorrelated for better optimization. After Decorrelation, more optimization techniques can be applied: Computation Pushdown. Data Model Cleanup. Cutting.

35 Future Works Write TR to formalize the XAT. Compare with ORDB, ODB, also XQA operators. Wrap Up: Finalize uncertain operators deal with collections Union, Navigate Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis Finalize the XAT Rewriting Rules for: Order Handling Update propagation. Translation from XAT back to Query Next Step: Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.