Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

XML: Extensible Markup Language
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
1 COS 425: Database and Information Management Systems XML and information exchange.
XML and The Relational Data Model
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Access Path Selection in a Relation Database Management System (summarized in section 2)
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Query Processing Presented by Aung S. Win.
Module 17 Storing XML Data in SQL Server® 2008 R2.
2.2 SQL Server 2005 的 XML 支援功能. Overview XML Enhancements in SQL Server 2005 The xml Data Type Using XQuery.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint.
Access Path Selection in a Relational Database Management System Selinger et al.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
Database Management 9. course. Execution of queries.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Querying Structured Text in an XML Database By Xuemei Luo.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Chapter 7 Optimizing the Optimizer. 2 The Oracle Optimizer is… About query optimization Is a sophisticated set of algorithms Choosing the fastest approach.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Database Systems Part VII: XML Querying Software School of Hunan University
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
XML and Database.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Session 1 Module 1: Introduction to Data Integrity
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
XML: Extensible Markup Language
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Chapter 2 Database Environment Pearson Education © 2009.
Lecture 12: Data Wrangling
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Outline Introduction Background Distributed DBMS Architecture
Data Model.
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Overview of Query Evaluation
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver

 XML accepted as the language for data interchange  Relational database investment  IBM developed DB2 XML Supports XML as native data format XQuery supported as second query language  Paper focuses on extensions made to DB2’s cost-based optimizer

 Largely declarative, like SQL  FLWOR statements Zero or more FOR and LET clauses Optional WHERE clause Optional ORDER BY clause RETURN clause  Example…

 XML data model is inherently heterogeneous and hierarchical  XML schemas are likely to change, or be unavailable or unknown  Developing a hybrid optimizer for XML and relational access paths that works with both SQL and XQuery

 Existing native XML data management systems: Lore, Niagra, TIMBER, Natix, Tox  DB2 XML goes further by representing XQuery queries as query graph models  No prior work describes a cost-based optimizer for an XQuery compiler  Two path expression evaluation techniques: structural joins and holistic algorithms DB2 XML uses holistic approach using TurboXPath algorithm Reduces number of plans & does not sacrifice plan quality

 Mapping of SQL or XQuery to query graph model (QGM)  Rewrite to more optimization-friendly representation  Cost-based optimizer chooses best query execution plan  QEP mapped to sequence of execution engine calls, called a section  Runtime engine executes the section when query is executed

 Number of QEPs is typically very large  Three key aspects of the architecture… Operators – query processing primitives that consume and produce tables Rules – define how operators may be combined into QEPs Configurable Enumeration Engine – invokes the plan generation rules and determines which sequences of joins to evaluate

 QEP operators maintain a running total of projected resources required  Most critical cost model input is number of records to be processed The model estimates filtering effect of predicates based on statistics about the database Based on the probabilistic model proposed in System R Each filtering operation is assigned a selectivity  Building an accurate cost model for operators that manipulate XML data is much more difficult

 Encapsulation of runtime functions as QEP operators allowed new operators to be added more easily  XSCAN operator – scans and navigates through XML to evaluate an XPath expression  XISCAN operator – Same as a relational index scan; returns RIDs of documents that satisfy an index expression  XANDOR operator – n-way merge of individual index scans by ANDing and ORing

 Changes were made to access rules to support construction of plans using the new XML operators  Access rules extended for XSCANs… Same as a relational system’s table scan Returns references to qualifying nodes

 Extensions for generating XML index plans More complicated than XSCAN plans XSCAN is necessary to eliminate false positives SORT removes duplicate documents

 Extensions for generating XANDOR plans… Combines index ANDing and ORing Can dynamically skip processing of some inputs

 Cardinality greatly affects cost and is very hard to estimate  Current DB2 infrastructure for cardinality and cost estimation was extended to support XML  Changes made to three general areas… 1) generalized predicate selectivity estimation to support XPath predicates and navigation; compute fanout 2) cardinality estimation extended to support XSCAN, XISCAN, and XANDOR 3) modified existing and designed new cost algorithms

 The average number of result XML items produced per input XML item  To estimate fanout, two assumptions are made… 1) Fanout uniformity – there will be the same number of B elements within each A result if A is an ancestor of B 2) Predicate uniformity – XML data items that bind to X and satisfy condition Y are uniformly distributed among all items that bind to X

 Cardinality of XSCAN operator estimated to be product of fanout of XPath expression, selectivity of predicates, and sequence size of input column  Sequence size is the average number of XML items per XML sequence flowing through a column  Estimated cardinality is shown at each step in this example

 Added cost estimation for the three new operators and modified existing related operators such as SORT and FILTER  Cost modeling is much more difficult due to semantic differences and complexity of the operators, and versatility and complexity of XML data  Structures for data distribution statistics would be as large as the data, and estimation would take longer than the actual query  The relational and XML models must be consistent in their level of detail in a mixed environment to avoid bias

 Performed by a utility called runstats, which was extended to support XML stat collection during a table scan  Two types of linear paths… 1) Simple Paths – end in element nodes only 2) Path-Value Pairs – could end in attribute values or text values  Two types of occurrence counts… 1) Node counts 2) Document counts  For each XML column, collects both types of counts for both types of paths  Bloom filter used to remember distinct paths and cap memory utilization  Reservoir sampling used to cap memory for frequent value statistics

 Exploit individual nodes returned from XML index scans  Consider additional plans that would defer XSCANs after index scans reduce the number of documents scanned increase the number of alternative plans  Investigate extending index ANDing heuristics  Extend statistics and cardinality estimation model to consider structural relationships between predicates  Collect data type specific statistics  Use more automated techniques to develop operator cost models  Extend the optimizer’s order optimization architecture to support bind order and document order

 Reusing DB2 infrastructure to support XQuery and SQL/XML was far faster than starting from scratch  Extending plan generation, cardinality and costing, and statistics components was challenging  Made possible by… introducing an XML column type modeling SQL, XQuery, SQL/XML uniformly representing XPath expressions as table functions

 Comprehension questions?...  Presentation questions… Paper is from 2006; what is the current state of DB2 XML? Does it make sense to store entire XML documents in a single cell in a relational database? In the future, does it make sense to maintain the relational model, with XML as an extension?