Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

1 Virtual Cursors for XML Joins Beverly Yang (Stanford) Marcus Fontoura, Eugene Shekita Sridhar Rajagopalan, Kevin Beyer CIKM’2004.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
CS 171: Introduction to Computer Science II
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 from the seminar support for non-standard datatypes in dbms Held by Brendan Briody Accelerating XPath Location Steps.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
2.2 SQL Server 2005 的 XML 支援功能. Overview XML Enhancements in SQL Server 2005 The xml Data Type Using XQuery.
Table & Query Design for Hierarchical Data without CONNECT-BY -- A Path Code Approach Charles Yu Database Architect Elance Inc. Elance Inc.
10/06/041 XSLT: crash course or Programming Language Design Principle XSLT-intro.ppt 10, Jun, 2004.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
ASP.NET Programming with C# and SQL Server First Edition
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Computing & Information Sciences Kansas State University Monday. 20 Oct 2008CIS 560: Database System Concepts Lecture 21 of 42 Monday, 20 October 2008.
Lecture 7 of Advanced Databases XML Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
Lecture 6 of Advanced Databases XML Querying & Transformation Instructor: Mr.Eyad Almassri.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
Computing & Information Sciences Kansas State University Thursday, 15 Mar 2007CIS 560: Database System Concepts Lecture 24 of 42 Thursday, 15 March 2007.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Database Systems Part VII: XML Querying Software School of Hunan University
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Session 1 Module 1: Introduction to Data Integrity
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Grouping Robin Burke ECT 360. Outline Extra credit Numbering, revisited Grouping: Sibling difference method Uniquifying in XPath Grouping: Muenchian method.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
XML: Extensible Markup Language
Indexing Structures for Files and Physical Database Design
Querying and Transforming XML Data
Indexing and Hashing Basic Concepts Ordered Indices
MANAGING DATA RESOURCES
XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type – thus constraints are often the only.
XML? What’s this doing in my database? Adam Koehler
Presentation transcript:

Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude

Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Ordered XML Data Model XML document as a tree structure - Relation as the ‘root’ - Nodes represent elements - Leaf nodes hold data values Document Type Descriptor - schema information about the XML document Order - a salient feature of an XML document

Significance of order in XML Order – Important from the point of view of reconstruction of XML documents - To ensure a lossless mapping from XML to RDB Performance issues - Choice of order dramatically affects performance - Enhances Efficient Translation of XML into SQL Order based functionality of XPath and XQuery XPath – a simple ‘path based’ query language XQuery – a complex query language based on XPath

Three dimensions of XML order Evaluation of Order based axes XPath expressions requiring document order 1. preceding 2. following Inter Element Order result set enforces document order among result set elements Intra Element Order For reconstruction, document order is important

Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

How is order encoded ? Order is preserved using a simple numbering scheme Each node is represented using a node_id Node-id is stored as a data value within the relation Numbering schemes capture enough information to reconstruct XML documents

Order Based Functionality of XPath XPath follows a step-by-step sequential evaluation, Each step is applied to a single node (context node) Result of each step is a set of nodes {node1,node2,..,node n} XPath syntax Path :: = /Step1/Step2/…/Step N Where each Xpath Step is defined as follows: Step :: = Axis :: Node-test Predicate* Axis selects a direction of navigation e.g. child :: title Would select all children that are ‘titles’

Order Based Functionality of XPath Axes – specify the direction of navigation in an XML document Up parent ancestor Down child descendant Left preceding Preceding-sibling Right following Following-sibling

Order Based Functionality of XQuery BEFORE operator - Return nodes from the first sequence that are before some node in the second sequence AFTER operator - Return nodes from the first sequence that are after some node in the second sequence XQuery supports range predicates - allows selection of a range of elements from a sequence e.g. /play/act[2 TO 4] Will return act #2,act #3, and act #4 in document order.

Global Order Encoding Methods Global Order Encoding Absolute positioning of nodes Best performance on queries - Query evaluation requires simple comparison between node positions Worst performance on updates, especially deletes play(1) title(2) text#(3) act(8)act(4) title(5)scene(7) text#(6)

Global Order Encoding (contd) Initially, sparse numbering is used for Global Order Encoding Sparse numbering brings down the cost of renumbering (on inserts/updates) Sparse numbering results in better performance on updates Makes intra-element and inter-element ordering easy (since global document order is easily available) Drawback - performs poorly on inserts (Local Order offers better performance for inserts/updates)

Global Order Renumbering Scenario Inserting a new element in an existing document causes many nodes to be renumbered In the adjoining figure, the highlighted nodes need to be renumbered (maximum in the global ordering scheme) play(1) title(2) text#(3) act(8) New Element act(4) title(5)scene(7)

Local Order Encoding Methods Local Order Encoding 1. Relative positioning of nodes 2. Best performance on updates 3. Worst performance on queries play(1) act(2)title(1)act(3) text(1) title(1)scene(2) text(1)

Local Order Encoding (continued….) How does local Order encoding reconstruct absolute path ? the relative position of a node is combined with the relative order of the parent this combined effect yields a vector that uniquely identifies the absolute position within the document (relative position of node) + (relative position of ancestor) = (absolute position of node in the document)

Local Order Renumbering Scenario As opposed to Global Order Encoding, Local Order requires a minimum number of nodes to be renumbered This is a major advantage, since it dramatically reduces the cost of inserts play(1) title(1) text#(1) act(2) New Element act(2) title(1)scene(2) scene(1)

Local Order Encoding (continued….) Incurs low overhead on updates Only “ following-sibling “ may require renumbering Drawbacks – Lack of global order information results in complex evaluations of following and preceding axes

Dewey Order Encoding Methods Dewey Order Encoding 1. Strikes a balance between Global and Local 2. Reasonable performance on updates and queries Play 1 title(1.1) text(1.1.1) act(1.2) title(1.1.2) act(1.3) scene(1.2.2) text( )

Dewey Order Encoding Each path uniquely identifies absolute position of a node in a document Query processing is similar to that of Global order Only “ following-sibling “ may require renumbering Drawbacks – Extra space required to store paths from root to the node

Dewey Order Renumbering Scenario Renumbering required is more than that for Local Encoding, however much less than that for Global Encoding play title text# act New element act titlescene

Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Shredding XML into Relations Schema-less Case Unknown schema of input XML documents Edge Approach - Each document is stored as a single table Schema-aware Case Schema of input XML documents is available Inlining – Single occurrence of child – store within parent relation Multiple occurrences – store as a new relation table

Inlining Inlining is an effective way of storing and querying XML provided the availability of Document Schema Inlining adapts to Global, Local and Dewey Orders. Every relation requires an additional column to encode document order storing order information of ‘inlined’ elements is unnecessary (Element position is determined from the position of parent and from the document schema)

Storing Order Information – Schema less case The Edge Approach Each relation is stored as a table Each tuple within the table represents a node Edge (id, parent_id, name, value) id synonymous to a primary key parent_id synonymous to the foreign key, provides link to the node’s parent name stores tag name of element value stores text value

Storing Order Information – Schema less case Edge approach adapts differently to Global, Local and Dewey Global Order Edge (id, parent_id, end_desc_id, path_id, value) end_desc_id – id of the last descendant of a node Local Order Edge (id, parent_id, sIndex, path_id, value) sIndex – sibling index of a node Dewey Order Edge (dewey, path_id, value) dewey – represents both order and ancestor information

Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Query Translation for Global Order Edge (id, parent_id, end_desc_id, path_id, value) Translation of following/preceding Select nodes from Edge table where id value (context node) > end_descendant_id of context node Select nodes from Edge table where id value (context node) < end_descendant_id of context node Translation of following-sibling/ preceding-sibling Select (nodes in Edge table with id value > id of context node) AND (nodes with parent_id = parent_id of context node) Select (nodes in Edge table with id value < id of context node) AND (nodes with parent_id = parent_id of context node) Note : above expressions are NOT actual SQL statements

Query Translation for Local Order Edge (id, parent_id, sIndex, path_id, value) Translation of following-sibling/ preceding-sibling (Similar to Global and Dewey Order) Translation of following/preceding ( Complex Task !!!) 1. Compute all ancestors of context node – { anc} 2. Compute ancestors of following-sibling - { anc_sib} 3. Compute descendants of { anc_sib} Challenges: Without knowledge of XML schema, retrieving ancestors/descendants is a complex task Involves recursion

Query Translation for Dewey Order Edge (dewey, path_id, value) dewey column - stored as variable length byte string - replaces parent_id, and end_desc_id in Global Edge Table - Encodes parent and descendant information within the dewey path - Eliminates need to store parent_id and child_id Drawback: Storage overhead due to large number of bytes allocated to each component.

Query Translation in Inlining Essentially uses the same algorithm as that for Edge approach but with 2 extensions XML data can be spread across several tables therefore evaluating axes requires access to multiple tables as opposed to accessing just one Edge table Secondly translation algorithm does not use recursion (since the schema contains sufficient information about depth and postion of nodes) Drawback: Data is partitioned across many tables, too many tables to handle

Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Storage Requirements Table 1: Indicates the storage requirements of Global, Local and Dewey Encoding Methods Order Scheme Edge Inlining Table SizeIndex SizeTable SizeIndex Size Global52.1 MB57.9 MB44.1 MB28.9 MB Local52.1 MB87.9 MB47.7 MB36.8 MB Dewey48.9 MB38.7 MB44.5 MB15.8 MB

Performance All experiments are based on the Shakespeare’s Plays dataset. Table 2: Test Queries Query Query Definition Q1 /play Q2 /play/act//speech Q3 /play/act/scene/speech Q4 /play/act/scene/speech[2] Q5 /play/act/scene/*[2] Q6 /play/act/scene/speech[1 TO 3] Q7 /play/act[2]/following:: speech Q8 /play/act/scene/speech/speaker/following-sibling::line[2] Q9 //act/scene/speech BEFORE /play/act[2]

Select and Reconstruct Modes XPath Queries essentially run in 2 different modes Select Mode : Result set contains only the ID’s of the nodes satisfying the XPath expression Reconstruct Mode: Entire XML fragments are extracted from the database in document order

Ordered Selection Edge Results X axis: Queries Y axis: Time (seconds)

Inlining Results

Reconstruction In reconstruct mode, XML documents need to be extracted from DB in document order Optimizers inability to pick the best plan rendered poor results On the other hand, using ‘tuned’ SQL queries yielded better results Note: Queries Q3,Q4,Q5,Q9 had a disastrous performance (way beyond the scope of indicated scale)

Performance Results based on experiments Global order is the most efficient order encoding method Followed by Dewey Order – second best performance Local Order uses sorting very often which degrades overall performance Typically Inlining performs better than Edge In general the XML document parsing overhead was more than XPath processing

Performance Conclusions based on results RDBMS efficiently supports ordered XML Global order is the best for query workloads Dewey Order is slightly less efficient than Global Order Best for a mix of queries and updates Schema Information makes Local Order a viable alternative Incomprehensiveness of Relational Optimizers to the hierarchical XML structure

Acknowledgements… Prof. Elke Rundensteiner Thank You …