XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,

Slides:



Advertisements
Similar presentations
17 Apr 2002 XML Syntax: DTDs Andy Clark. Validation of XML Documents XML documents must be well-formed XML documents may be valid – Validation verifies.
Advertisements

XML to Relational Database Mapping
XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
Storing and Querying XML Documents Using Relational Databases Mustafa Atay Wayne State University Detroit, MI February 28, 2006.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Storing and Querying XML Data in Databases Anupama Soli
More of DTDs Lecture 3. Symbols used in DTD COMMA “, ” enforces sequence.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Integrated Database Design Mark Graves. This presentation is Copyright 2001, 2002 by Mark Graves and contains material Copyright 2002 by Prentice Hall.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
TU/e eindhoven university of technology / faculty of mathematics and informatics Exporting Databases in XML DTD A Conceptual and Generic Approach Philippe.
Database Systems and XML David Wu CS 632 April 23, 2001.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Jennifer Widom XML Data DTDs, IDs & IDREFs. Jennifer Widom DTDs, IDs & IDREFs “Well-Formed” XML Adheres to basic structural requirements Single root element.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – Data Model, DTD and Schema
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
©Silberschatz, Korth and Sudarshan10.1Database System ConceptsIntroduction XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Originally.
XML By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Chapter 10: XML.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
XML (2) DTD Sungchul Hong.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Mapping RDB Schema to.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Copyrighted material John Tullis 3/18/2016 page 1 04/29/00 XML Part 4 John Tullis DePaul Instructor
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML to Relational Database Mapping
Managing XML and Semistructured Data
XML Data DTDs, IDs & IDREFs.
CSE591: Data Mining by H. Liu
Presentation transcript:

XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi

2 Introduction XML has emerged as the standard for representing and exchanging data on the World Wide Web. The increasing amount of XML documents requires the need to store and query XML documents efficiently.

3 Current approaches of storing and querying XML documents Native XML repositories, e.g., Software AG’s Tamino, eXcelon’s XIS. XML-enabled commercial database systems such as SQL Server, Oracle, and DB2 Using RDBMS/ODBMS to store and query XML documents.

4 Issues of the relational approach Schema Mapping –XML data model needs to be mapped into the relational model Data Mapping –XML documents need to be shredded and composed into tuples to be inserted into the relational database Query Mapping –XML queries need to be translated into SQL queries Reverse Data Mapping –Query results need to be tagged to XML format.

5 Our contributions We propose a schema mapping algorithm, ODTDMap, which generates a relational schema from an XML DTD for storing and querying ordered XML documents. Improvements over the existing algorithms –Losslessness –Efficient support for XML queries –Completeness (recursion, set-valued attributes DTD operators)

6 Outline of the talk Introduction of XML DTDs Mapping DTDs to relational schemas –Simplifying DTDs –Creating and inlining DTD graphs –Generating relational schemas An example Conclusions and future work

7 An overview of DTDs A DTD example <!DOCTYPE memo [ ]

8 DTD: Document Type Defintion <!DOCTYPE root-element [ doctype- declaration..., content model: “|”, “,”, “*”, “+”, “?”

9 DTD: Document Type Definition (con’t) declares which attributes are allowed or required in which elements attribute types: –CDATA: any value is allowed (the default) –(value|...): enumeration of allowed values –ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) –ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them deprecated) attribute defaults: –#REQUIRED: the attribute must be explicitly provided –#IMPLIED: attribute is optional, no default provided –"value": if not explicitly provided, this value inserted by default –#FIXED "value": as above, but only this value is allowed

10 Mapping DTDs to relational schemas Simplifying DTDs Creating and inlining DTD graphs Generating relational schemas

11 Simplifying DTDs A DTD might be very complex due to nesting, e.g., An XML query language is concerned about: –The parent-child relationships between XML elements –The relative order relationships between siblings (add an ordinal attribute to each relation)

12 DTD simplifications rules 1.e +  e * 2.e?  e 3.(e 1 | … | e n )  (e 1, …,e n ) 4.(a) (e 1,…,e n ) *  (e 1 *, …,e n * ) (b) e **  e * 5. (a) …, e, …, e, …  …,e *, …,… (b) …, e, …, e *, …  …,e *, …,… (c) …, e *, …, e, …  …,e *, …,… (d) …, e *, …, e *, …  …,e *, …,…

13 Example of simplifying a DTD simplified to

14 Creating and inlining DTD graphs We create a DTD graph based on the simplified DTD. Definition 3.2 (DTD graph) The structure of a DTD can be represented by a labeled graph, in which nodes represent elements and attributes, and edges represent their parent-child relationships. The edges are labeled by either `*' (star edge) or `, ' (normal edge) where the label `,' is not shown for simplicity. Idea: inline a child c to its parent p if p can contain at most one occurrence of c. Rationale: inlined elements will produce a relation.

15 Inlinable node and subtree, shared node Definition 3.3 (Inlinable node) Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge. Definition 3.4 (Inlinable subtree) Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree. This subtree is called the inlinable subtree for the node e (it is rooted at e). Definition 3.5 (Shared node) Given a DTD graph, a node is called a shared node if it has more than one incoming edge.

16 Inlining Case 1: Node a is connected to b by a normal edge and b has no other incoming edges, inlining b to a. Case 2: Node a is connected to b by a normal edge but b has other incoming edges, b is a shared node, no inlining. Case 3: Node a is connected to b by a star edge, no inlining.

17 Inlining (con’t)

18 Inlining DTD graphs

19 Complexity of inlining Theorem 3.7 (Time Complexity) The time complexity of our inlining algorithm is O(n) where n is the number of elements in the input DTD.

20 The inlining procedure

21 The inlining procedure (con’t) INCORRECT

22 The inlining procedure (con’t) CORRECT

23 Generating relational schema

24 Generating schema mapping info. Definition 3.8 (  Mapping)  is a mapping from X to R, where X is the set of XML element and attribute types in the input XML DTD, and R is the set of relations in the relational database. Given an XML element type e,  (e) will return the corresponding relation that is used to store e. Similarly, given an XML attribute type a of element type e,  (e.a) will return the corresponding relation that is used to store a of e.

25 A complete example

26 DTD graph Inlined DTD graph

27 Generated relational schema

28 Conclusions We defined the schema mapping algorithm ODTDMap, which has several improvements over the existing ones. It is lossless in the sense that one can reconstruct original XML document in the given document order, based on the target relational schema generated by ODTDMap. It has efficient support for recursive queries and schemas. It defines how to map set-valued XML attributes. Experimental results showed good performance and scalability of the algorithm.

29 Future work Extending our work to XML Schema to support data types other than string type. Maintain the ID/IDREF/IDREFS in terms of key and foreign key constraints.