2015/5/5 A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Ning Zhang(University of Waterloo) Varun Kacholia(Indian Institute.

Slides:



Advertisements
Similar presentations
Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
Advertisements

APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.
Computing Structural Similarity of Source XML Schemas against Domain XML Schema Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Jixue Liu 3 Guoren Wang 4 Chi.
XML to Relational Database Mapping
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Internet Technologies1 1 Lecture 4: Programming with XSLT.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
2010/3/81 Lecture 8 on Physical Database DBMS has a view of the database as a collection of stored records, and that view is supported by the file manager.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
BTREE Indices A little context information What’s the purpose of an index? Example of web search engines Queries do not directly search the WWW for data;
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
COSC2007 Data Structures II
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Represented by: Ai Mu Based on the paper written by Ning Zhang, Varun.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
File Structures Foundations of Computer Science  Cengage Learning.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
Lecture 5 Cost Estimation and Data Access Methods.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
XML Access Control Koukis Dimitris Padeleris Pashalis.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
B+ Tree.
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Semi-Structured data (XML Data MODEL)
Early Profile Pruning on XML-aware Publish-Subscribe Systems
XML Query Processing Yaw-Huei Chen
Database Design and Programming
Presentation transcript:

2015/5/5 A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Ning Zhang(University of Waterloo) Varun Kacholia(Indian Institute of Technology, Bombay) M. Tamer Ozsu (University of Waterloo) (ICDE04)

2015/5/5 Outline Abstract Preliminary Nok pattern matching Physical storage Xpath Query at the physical level Experimental evaluation Conclusion

2015/5/5 Abstract next-of-kin (NoK) pattern matching speed up the node selection reduce the join size significantly propose a succinct XML physical storage scheme Highest performance

2015/5/5 Preliminary-Xpath & pattern tree Example : “ find all books written by Stevens whose price is less than 100” Xpath expression as following //book[author/last="Stevens"][price<100]

2015/5/5 Preliminary-Xml document TCP/IP Illustrated Stevens W Advanced Programming in the Unix Environment Stevens W. Addison−Wesley Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers

2015/5/5 Preliminary-- subject tree Define a transform mapping Σ = {a, b, c, e, f, g, i, j, z}

2015/5/5 Preliminary-- (NoK) pattern tree three types of constraints in path expression : tag-name constraints, value constraints, and structural relationship constraints. { b | tag(b) = “book” ∧ ∃ a, l, p tag(a) = “author” ∧ tag(l) = “last” ∧ tag(p) = “price” ∧ value(l) = “Stevens” ∧ value(p) < 100 ∧ descendant(root, b) ∧ child(b, a) ∧ child(a, l) ∧ child(b, p)} next-of-kin (NoK) pattern tree: is a special pattern tree that consists of edges whose labels are in {/, ◁ }, ◁ where represents following-sibling axis. (v.s. Special tags {*,//} in pattern tree)

2015/5/5 Nok--Three strategies up proformance two steps in the process of matching NoK pattern trees to the subject tree: Native approach:Traverse and try to match with the root of the NoK pattern tree. When a node matching, then start the NoK pattern tree matching process 1 st : since a NoK pattern tree b/c could be obtained from the path expression /a//b/c in which case any descendant of /a could be a starting point for NoK pattern matching. 2 nd : In string pattern matching, while matching the string itself is straightforward.

2015/5/5 Nok-up proformance( index strategies) Index on tag names: B+ tree on tag names, and index lookup for the root of the NoK Index on data values: B+ tree for all values in the XML doc., use index to locate value.

2015/5/5 Nok pattern matching- Algorithm following/preceeding-sibling Algorithm 1 NoK Pattern Matching NPM(proot, snode,R) 1: if proot is the returning node 2: then LIST-APPEND(R, snode); 3: S ← all frontier children of proot; 4: u ← FIRST-CHILD(snode); 5: repeat 6: for each s ∈ S that matches u with both tag name and value constraints 7: do 8: b ← NPM(s, u,R); 9: if b = TRUE 10: then S ← S \ {s}; 11: delete s and its incident arcs from the pattern tree; 12: insert new frontiers caused by deleting s; 13: u ← FOLLOWING-SIBLING(u); 14: until u = NIL or S = ∅ 15: if S = ∅ 16: then R ← ∅ ; 17: return FALSE; 18: return TRUE;

2015/5/5 Nok pattern matching- Example subject tree in Figure 2 and the NoK pattern tree in Figure 1(b) Transform to (b[c/g="Stevens"][j<100]) Complexity to O(mn) where m and n are the number of nodes in the pattern tree and subject tree

2015/5/5 Physical storage — 4 Goals XML structural information should be stored separately from the value information. subject tree should be “materialized” to fit into the paged I/O model. --tree structure should be represented by a one- dimensional “string”. storage scheme should have enough auxiliary information to speed up NoK pattern matching storage scheme should be adaptable to support updates

2015/5/5 Physical storage-- 1 st Goal Value information storage –XML document is a mixture of schema information (tag names and their structural relationships) and value information (element contents). –any query can be divided into two subqueries: pattern matching on the tree structure and selection based on values. –allows us to separate the different concerns and address each appropriately –can build a B+ tree on the value information and a path index

2015/5/5 Physical storage-- 1 st Goal (How maintain) Dewey ID:maintain the connection between structural information and value information –root a and its second child b are 0, and 0.2 value for all subject tree nodes is stored sequentially in a data file (by a binary tuple (len, value) ) B+ tree on the data file whose keys are the hashed data values and returns a set of Dewey ID’s whose nodes contain that value.

2015/5/5 Physical storage-- 1 st Goal(structure) Structural information storage –store the nodes in pre-order and keep the tree structure by properly inserting pairs of ‘()’ –only retain closing parentheses as in (a(b)(c))  a b)c)).

2015/5/5 Physical storage-- 1 st Goal(structure) To speed up the query process, we need to store extra information in each page. most useful information for locating children, siblings and parent is the node level infor.. each page, an extra tuple (st, lo, hi) is stored, where st is the level of the last node in the previous page, lo and hi are the minimum and maximum levels of all nodes in that page

2015/5/5 Physical storage-- 1 st Goal(Page structure) single-pass NoK pattern matching algorithm based on this physical string representation can be adapted to the streaming XML context except that page headers (which help to skip page I/O’s) are not necessary in the streaming context since each page needs to be read into main memory anyway.

2015/5/5 Physical storage-- 1 st Goal(Page structure) the first child of a node at level l is the next character if its level is l + 1. To avoid unnecessary page I/O’s, we should exploit the maximum and minimum level information in each page as described in the page header.

2015/5/5 Xpath query

2015/5/5 Experimental evaluation

2015/5/5 Experimental evaluation

2015/5/5 Experimental evaluation

2015/5/5 Problem—How encoding we do not know how many bits are needed for encoding the number, unless we parse the XML document beforehand. However, parsing beforehand is impossible in the context of streaming XML where we have no knowledge of the upcoming events

2015/5/5 Conclusion NoK pattern matching reduces the number of structural joins in the later step. NoK pattern matching can be evaluated highly efficiently (only need a single scan of input data) using the physical storage scheme we proposed. any storage scheme in the database systems should consider how to employ concurrency control and how it affect the update process. Although these are extremely important issues, we leave them as future work