XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003.

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

XML: Extensible Markup Language
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Transforming XML Part I Document Navigation with XPath John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel:
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
1 from the seminar support for non-standard datatypes in dbms Held by Brendan Briody Accelerating XPath Location Steps.
1 COS 425: Database and Information Management Systems XML and information exchange.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
XPath Tao Wan March 04, What is XPath? n A language designed to be used by XSL Transformations (XSLT), Xlink, Xpointer and XML Query. n Primary.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
SD2520 Databases using XML and JQuery
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
Introduction to XPath Web Engineering, SS 2007 Tomáš Pitner.
Lecture 2 : Understanding the Document Object Model (DOM) UFCFR Advanced Topics in Web Development II 2014/15 SHAPE Hong Kong.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XQL, OQL and SQL Xia Tang Sixin Qian Shijun Shen Feb 18, 2000.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
XPath Aug ’10 – Dec ‘10. XPath   XML Path Language   Technology that allows to select a part or parts of an XML document to process   XPath was.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
1 Relational Algebra and Calculas Chapter 4, Part A.
WPI, MOHAMED ELTABAKH PROCESSING AND QUERYING XML 1.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
1 XML Data Management XPath Principles Werner Nutt.
Review 1 Queue Operations on Queues A Dequeue Operation An Enqueue Operation Array Implementation Link list Implementation Examples.
Martin Kruliš by Martin Kruliš (v1.1)1.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
1 Updates ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery Stefan Manegold
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Querying and Transforming XML Data
Semi-Structured Data and Agile Application Development
{ XML Technologies } BY: DR. M’HAMED MATAOUI
(b) Tree representation
Presentation transcript:

XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003

Topics XML Indexing –“Accelerating XPath Location Steps” Torsten Grust, ACM SIGMOD 2002 XML Query Optimization –“Multi-level Operator Combination in XML Query Processing” Shurug Al-Khalifa and H.V. Jagadish, ACM CIKM 2002

XML Query Languages XPath –Developed by the World Wide Web Consortium –Version 1.0 became a W3C Recommendation on November 16, 1999 –Version 2.0 is a working draft.

XML Query Languages XQuery –Developed by the World Wide Web Consortium as well –Currently a working draft

Axes on XPath Tree There are 13 axes according to the XPath 2.0 Technical Report –Forward Axes child, descendant, attribute, self, descendant-or-self, following-sibling, following, namespace (deprecated) –Reverse Axes parent, ancestor, preceding-sibling, preceding, ancestor-or-self

XML Traversal and Storage Tree-based traversal Efficient storage is challenging –Especially for relational databases, which deals with tuples and is not designed to handle recursion or nested elements

Proposed Solutions “Querying XML Data for Regular Path Expressions” Li and Moon, VLDB 2001 “A Fast Index for Semistructured Data” Cooper, Sample, Franklin, Hjaltason and Shadmon, VLDB 2001 “DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases” Goldman and Widom, VLDB 1997

Problems with Proposed Solutions Solutions focus on support of / and // location steps. Inadequate support for XPath. Proposals rely on technologies outside the relational domain.

Author’s Proposal XPath Accelerator Works entirely within relational database. Uses traditional relational syntax for queries. Benefits from advanced index technologies, such as R-tree.

XPath Tree Traversal Context Node: starting point of any traversal Location Steps: syntactically separated by /, evaluated from left to right –A step’s axis establishes a subset of document nodes (a document region)

XPath Forward Axes Child Descendant Attribute Self Descendant-or-self Following-sibling Following Namespace

XPath Reverse Axes Parent Ancestor Preceding-sibling Preceding Ancestor-or-self

Sample XML Tree a b c ed f gh ij

Encoding XML Document Regions Formula: v/descendant  v/descendant  v/following  v/preceding  v/self Each node appears once in this formula What are the ways to uniquely identify different nodes?

Numbering Nodes Grust: Find out preorder and postorder rank posts Tatarinov: Global, Local, Dewey Li & Moon: Order-size pairs

XML Document Regions Descendants? Ancestors? Preceding? Following? a b c de f gh ij

XPath Tree Node Descriptor desc(v) = {pre(v),post(v),par(v),att(v),tag(v)} window(α,v) = {condition for each field in desc()} Example: window(child,v) = {(pre(v),∞),[0,post(v)),pre(v),false,*}

XPath Query Windows Axis αprepostparatttag Child(pre(v),∞)[0,post(v))pre(v)false* Descendant(pre(v),∞)[0,post(v))*false* Desc-or-self[pre(v),∞)[0,post(v)]*false* Parentpar(v)(post(v),∞)*false* Ancestor[0,pre(v))(post(v),∞)*false* Anc-or-self[0,pre(v)][post(v),∞)*false* Following(pre(v),∞)(post(v),∞)*false* Preceding(0,pre(v))(0,post(v))*false* Fol-sibling(pre(v),∞)(post(v),∞)par(v)false* Prec-sibling(0,pre(v))(0,post(v))par(v)false* Attribute(pre(v),∞)[0,post(v))pre(v)true*

XPath Evaluation Given an XPath expression e, an axis α, and a node v, we can evaluate this: –query(e/α) = SELECT v’,* FROM query(e) v, accel v’ WHERE v’ INSIDE window(α,v) This pseudo-SQL code can be flattened into a plain relational query with a flat n-ary self-join.

XML Instance Loading Loading XML Instance into the database means mapping its nodes into the descriptor table. Can use callback procedures described in text to load element nodes into relational table. Make separate table for element contents.

Potential Issues Insertion of node –Need to renumber all nodes to reflect changes Deletion of node –Only need to remove its entry in accelerator table

Node Descriptor Indexing Efficiently supported by R-trees. Can also be supported by B-trees.

Example of pre/post rank distribution

Shrink-wrapping the //-axis Optimizing window for descendant axis For each node, we need to determine the ranges of pre and post ranks for its leftmost and rightmost leaf nodes. For any node v in a tree t, we have pre(v) − post(v) + size(v) = level(v) For a leaf node v’, size(v’) = 0, therefore pre(v’) − post(v’) = level(v’) ≤ height(t)

Shrink-wrapping the //-axis For the rightmost leaf v’ of node v: post(v) = post(v’) + (level(v’) − level(v)) Using the previous equations, we have: pre(v’) ≤ post(v) + height(t) For the leftmost left v’’ of node v, we have a similar result: post(v’’) ≥ pre(v) − height(t) Can use these formula to shrink windows

Shrink-wrapping the //-axis Original window { (pre(v),∞), [0,post(v)), *, false, * } New window { (pre(v),post(v)+height(t)], [pre(v)−height(t),post(v)), *, false, * } Similar techniques can be used to optimize the query windows of other axes.

Shrink-wrapping the //-axis

Finding Leaves in an XML Tree

XPath Traversals with and without shrunk windows QueryShrunk Not Shrunk # Nodes //open_auction//description //open_auction//description//listitem //open_auction//description//listitem//keyword

XPath Accelerator v. Edge Map

R-Tree v. B-Tree

Performance for the ancestor axis

Performance: XPath Accelerator v. EE/EA-Join

Capabilities of XPath Accelerator Runs on top of a relational backend to leverage its stability, scalability, and performance. Supports the whole family of XPath axes in an adequate manner. To originate XPath traversals in arbitrary context nodes. Provides the groundwork for an effective cost-estimation for XPath queries.

XML Query Optimization Macro-level algebra: manipulates sets of trees directly –heavyweight, but more directly expressive Micro-level algebra: manipulates sets of elements In both algebra, basic operators are “intuitive” unit operations such as selections, projections, joins and set operations.

XQuery Expression and Pattern Tree

Macro-algebra A macro-algebra would implement this entire expression as a single pattern-tree based selection operator (to select matching books), followed by a projection operator (to return titles).

Micro-algebra A micro-algebra would break up the selection pattern into one selection operator per node (e.g. (tag=“book”), (tag=“year” && content > 1995)) and one containment join operator per edge. Result of sequence of joins would then be projected on the book element, after which its title can be obtained as before.

Query Processing Implementation 1.Identify lists of candidate elements in the database to match each node in the specified structural pattern. 2.Find combinations of candidate elements, one from each list, that satisfy the required structural relationships. 3.Apply any conditions that involve multiple nodes in the structural pattern to eliminate some combinations.

Containment Join Given two sets of elements U and V, a containment join returns pairs of elements (u,v) such that –u  U and v  V –u “contains” v i.e. node u is an ancestor of node v in the tree representation

Containment Join Implementation Three main options: –Scan the entire database –Use an index to find candidate nodes for one end of the join, and navigate from there –Use indices to find candidate nodes for both ends of the join, and compute a containment join between these candidate sets

Projection Merging

Set Operations Union compatibility is not an issue. –In the relational world, union compatibility is an important consideration with respect to set operations. –In XML, since heterogeneous collections are allowed, this is not an issue.

Union in XML Give two pattern trees PT 1 and PT 2, let PT C be a common component of the two pattern trees such that: –PT 1 − PT C = PT’ 1 and PT 2 − PT C = PT’ 2 where PT’ 1 and PT’ 2 are both trees –Node i in PT C has node j in PT’ 1 such that edge (i,j) is in PT 1, if and only if node i also has some node k in PT’ 2 such that edge (i,k) is in PT 2.

Different Pattern Trees and Plans

Micro-operator Merging: New Access Methods At macro-level, we considered a pattern tree selection as a single heavyweight operator. At micro-level, the approach is to break up a pattern tree selection into multiple containment join operators.

Performance: Union

Performance: Intersection

Performance by Query Structure

Parent-Child Join Performance

Ancestor-Descendant Join Performance

Performance Comparison for Different Pushes

Conclusions It is not enough to consider XML query optimization purely at the micro-algebra or purely at the macro-algebra level, with simple operators. One has to consider access methods for combination of operators, switching between the micro and macro levels as needed.