5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.

Slides:



Advertisements
Similar presentations
Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
Advertisements

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
XML to Relational Database Mapping
XML: Extensible Markup Language
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
XML: Extensible Markup Language. Slide Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree)
2015/5/5 A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Ning Zhang(University of Waterloo) Varun Kacholia(Indian Institute.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Managing XML and Semistructured Data Lecture : Indexes.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
4/17/2017.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
SD2520 Databases using XML and JQuery
4/20/2017.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Database Systems Part VII: XML Querying Software School of Hunan University
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 26-2 Introduction Although.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
KAIST2002 SIGDB Tutorial1 Indexing Methods for Efficient XML Query Processing Jun-Ki Min KAIST
More XML: semantics, DTDs, XPATH February 18, 2004.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML: Extensible Markup Language
Querying and Transforming XML Data
XML: Extensible Markup Language
OrientX: an Integrated, Schema-Based Native XML Database System
(b) Tree representation
XML Data DTDs, IDs & IDREFs.
Semi-Structured data (XML Data MODEL)
XML Query Processing Yaw-Huei Chen
2/18/2019.
XML indexing – A(k) indices
Semi-Structured data (XML)
Presentation transcript:

5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University

5/2/20052 Outline Introduction to XML Storage Query Languages Indexing Query Processing Conclusions

5/2/20053 From Documents to Data HTML describes presentation References S. Abiteboul, P. Buneman, D. Suciu, Data On The Web, 2000.

5/2/20054 From Documents to Data (cont.) XML (eXtensible Markup Language) describes content S. Abiteboul P. Buneman D. Suciu Data On The Web 2000

5/2/20055 XML Syntax Element a piece of text bounded by matching tags: D. Suciu elements can be nested Attribute (name, value) pair: … alternative ways to represent data XML document has a single root element Well-formed XML documents tags must nest properly attributes must be unique

5/2/20056 XML Hierarchical Data Model XML is ordered references book author titleyear 2000Data on the Web S. Abiteboul P. Buneman D. Suciu author … …

5/2/20057 Specifying the Structure DTD (Document Type Definition): A context- free grammar <!DOCTYPE references [ ]>

5/2/20058 Specifying the Structure (cont.) XML Schema in XML format element names and types associated locally includes primitive data types a superset of DTDs Valid XML documents the document must be well-formed the element names must follow the structure specified in a DTD file or XML schema file

5/2/20059 Storing XML Documents Designing a specialized system for storing native XML data Using a DBMS to store the whole XML documents as text fields Using a DBMS to store the document contents as data elements It must support the XML’s ordered data model

5/2/ Using a DBMS: Relational DTD Schema-aware An element that can occur at most once in its parent is stored as a column of the table representing its parent ParentIDIDTEXT ParentIDIDtitleyear35“S. Abiteboul” 23“Data on The Web”“2000”36“P. Buneman” 24……37“D. Suciu” The book tableThe author table references book author titleyear 2000Data on the Web S. Abiteboul P. Buneman D. Suciu author … …

5/2/ Using a DBMS: Edge Schema-less A single table is used to store the entire document Each node is assigned an ID in depth first order references book author titleyear 2000 Data on the Web S. Abiteboul P. Buneman D. Suciu author … … root

5/2/ Using a DBMS: Edge (cont.) SourceIDtagordinalTargetIDData 1reference12NULL 2book13NULL 2book24NULL 3author10“S. Abiteboul” 3author20“P. Buneman” 3author30“D. Suciu” 3title40“Data on The Web” 3year50“2000” The edge table

5/2/ XPath XPath is a language for addressing parts of an XML document. XPath uses path expressions to select nodes or node- sets that satisfy certain patterns specified in the expression. The names in the XPath expression are element or attribute names in the XML document. A single slash (/) before an element specifies that the element must appear as a direct child of the previous (parent) element. A double slash (//) specifies that the element can appear as a descendant of the previous element at any level.

5/2/ XPath Examples references selects all the child nodes of the references element /references selects the root element references //book selects all book elements no matter where they are in the document references//book selects all book elements that are descendant of the references element /references/* selects all the child nodes of the references element //book/title | //book/author selects all the title AND author elements of all book elements

5/2/ XQuery XQuery is a language for finding and extracting elements and attributes from XML documents XQuery uses XPath expressions, but has additional constructs. FLWR stands for the four main clauses of XQuery: FOR LET WHERE RETURN For example: for $b in doc(“references.xml")//book where count ($b/author) > 0 return { $b/title } { for $a in $b/author return $a }

5/2/ Indexing In order to find all occurrences of a query pattern, efficient mechanisms are needed for Determining the ancestor-descendant relationship between XML elements Accessing XML values Two types of indexes that can help determine the ancestor-descendant relationships: Structural index: It can reduce the time for traversing the XML hierarchy. Numbering scheme: It encodes each element by its positional information within the XML hierarchy. Using such a numbering scheme, the ancestor-descendant relationship between a pair of elements can be determined quickly.

5/2/ Structural Index DataGuides [Goldman97]: Every label path of the source graph has exactly one data path instance in its DataGuide. C D C D C D A B B C D C D AB C D AB

5/2/ Structural Index (cont.) 1-Index [Milo99]: Grouping together nodes if they have the same set of incoming paths D CABAB D D C A B D CA B D data graph1-indexdataguide

5/2/ Structural Index (cont.) Covering indexes [Kaushik02] Forward and Backward Index (F&B-Index) Add inverse edges to the graph Compute the 1-index (or DataGuide) for the modified graph The size of F&B-Index is too large. To reduce the size: only useful tags are indexed do not index all idref edges (XPath gives a higher priority to tree edges and // matches only tree edges) exploit local similarity (short paths only) restrict tree deepth

5/2/ Numbering Scheme Dewey Decimal Coding [ Tatarinov02 ] references book author titleyearauthor title 1.2.2

5/2/ Numbering Scheme (cont.) Inserting new elements references book author titleyearauthor title new element nodes that require renumbering

5/2/ Numbering Scheme (cont.) Preorder and postorder [Dietz82] (preorder, postorder) x is an ancestor of y iff x occurs before y in the preorder traversal and after y in the postorder traversal. references book author titleyearauthor (1,10) (2,6)(2,6) (8,9)(8,9) (3,1)(3,1)(4,2)(4,2)(5,3)(5,3)(6,4)(6,4)(7,5)(7,5)(9,7)(9,7) title (10,8)

5/2/ Numbering Scheme (cont.) Various interval schemes (docno, begin:end, level) [Zhang01] The begin and end positions can be generated by doing a depth-first traversal of the tree and sequentially assigned a number at each visit. (preorder, size) [Li01] Size is an arbitrary integer larger than the total number of the current descendants. (lowest_post, postorder) [Agrawal89] Lowest_post is the lowest postorder number of its descendants.

5/2/ Query Processing To find all occurrences of a query pattern in XML documents. Navigation-based approach It computes results by analyzing an input document one tag at a time. The query is represented as a non-deterministic finite automaton (NFA) [Diao03] Index-based approach It uses precomputed indexes to answer the query.

5/2/ Holistic Twig Join [Bruno02] Indexes string: (doc, left, level) element: (doc, left: right, level) Query: A//B//C A1 B1 A2 B2 C1 data SASA SBSB SCSC A1 A2 B1 B2 C1 stack encoding A1 B1 C1 A1 B2 C1 A2 B2 C1 query results

5/2/ Count (A // B // C) XPath Query Sequential Data SASA SBSB SCSC Read A-node’s Count = 1 B-node’s Count = 0 C-node’s Count = 0 Count Operation [Chen04]

5/2/ Count (A // B // C) XPath Query Sequential Data SASA SBSB SCSC Read A-node’s Count = 2 B-node’s Count = 0 C-node’s Count = 0 Count Operation (cont.)

5/2/ SASA SBSB SCSC A(2) null Count (A // B // C) XPath Query Sequential Data Count Operation (cont.) Read A-node’s Count = 0 B-node’s Count = 1 C-node’s Count = 0

5/2/ SASA SBSB SCSC A(2) null Count (A // B // C) XPath Query Sequential Data Count Operation (cont.) Read A-node’s Count = 0 B-node’s Count = 2 C-node’s Count = 0

5/2/ Count (A // B // C) XPath Query Sequential Data Count Operation (cont.) Query result is 2 * 2 = 4. SASA SBSB SCSC A(2) null B(2) Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 1

5/2/ Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC A(2) null B(2) Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 0 Sequential Data

5/2/ Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC A(2) null Read A-node’s Count = 0 B-node’s Count = 1 C-node’s Count = 0 Sequential Data B(2) –1 = 1

5/2/ Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC A(2) null Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 0 Sequential Data

5/2/ Count Operation (cont.) Count (A // B // C) XPath Query Read A-node’s Count = 1 B-node’s Count = 0 C-node’s Count = 0 Sequential Data SASA SBSB SCSC A(2) –1 = 1

5/2/ Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 0 Sequential Data

5/2/ Future Work Version management Materialized views Cache management Aggregate query processing Streaming data processing

5/2/ References [Agrawal89] R. Agrawal et al., “Efficient management of transitive relationships in large data and knowledge bases,” SIGMOD, [Bruno02] N. Bruno et al., “Holistic twig joins: Optimal XML pattern matching,” SIGMOD, [Chen04] Yaw-Huei Chen and Ming-Chi Ho, “Aggregate query processing of streaming XML data,” ICS, [Christophides03] V. Christophides et al., “On labeling schemes for the semantic web,” WWW, [Diao03] Y. Diao et al., “Path sharing and predicate evaluation for high- performance XML filtering,” ACM TODS, [Dietz82] P.F. Dietz, “Maintaining order in a linked list,” ACM Symposium on Theory of Computing, May [Goldman97] R. Goldman and J. Widom, “DataGuides: Enabling query formulation and optimization in semistructured databases,” VLDB, 1997.

5/2/ References (cont.) [Kaushik02] R. Kaushik et al., “Covering indexes for branching path queries,” SIGMOD, [Li01] Q. Li and B. Moon, “Indexing and querying XML data for regular path expressions,” VLDB, [Milo99] T. Milo and D. Suciu, “Index structures for path expressions,” Proc. of the Int’l Conf. on Database Theory, [Tatarinov02] I. Tatarinov et al., “Storing and querying ordered XML using a relational database system,” SIGMOD, [Tian02] F. Tian et al., “The design and performance evaluation of alternative XML storage strategies,” SIGMOD Record, March [Zhang01] C. Zhang et al., “On supporting containment queries in relational database management systems,” SIGMOD, 2001.