Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,

Similar presentations


Presentation on theme: "XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,"— Presentation transcript:

1 XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: artem@wayne.edu Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi

2 2 Introduction XML has emerged as the standard for representing and exchanging data on the World Wide Web. The increasing amount of XML documents requires the need to store and query XML documents efficiently.

3 3 Current approaches of storing and querying XML documents Native XML repositories, e.g., Software AG’s Tamino, eXcelon’s XIS. XML-enabled commercial database systems such as SQL Server, Oracle, and DB2 Using RDBMS/ODBMS to store and query XML documents.

4 4 Issues of the relational approach Schema Mapping –XML data model needs to be mapped into the relational model Data Mapping –XML documents need to be shredded and composed into tuples to be inserted into the relational database Query Mapping –XML queries need to be translated into SQL queries Reverse Data Mapping –Query results need to be tagged to XML format.

5 5 Our contributions We propose a schema mapping algorithm, ODTDMap, which generates a relational schema from an XML DTD for storing and querying ordered XML documents. Improvements over the existing algorithms –Losslessness –Efficient support for XML queries –Completeness (recursion, set-valued attributes DTD operators)

6 6 Outline of the talk Introduction of XML DTDs Mapping DTDs to relational schemas –Simplifying DTDs –Creating and inlining DTD graphs –Generating relational schemas An example Conclusions and future work

7 7 An overview of DTDs A DTD example <!DOCTYPE memo [ ]

8 8 DTD: Document Type Defintion <!DOCTYPE root-element [ doctype- declaration..., content model: “|”, “,”, “*”, “+”, “?”

9 9 DTD: Document Type Definition (con’t) declares which attributes are allowed or required in which elements attribute types: –CDATA: any value is allowed (the default) –(value|...): enumeration of allowed values –ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) –ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them deprecated) attribute defaults: –#REQUIRED: the attribute must be explicitly provided –#IMPLIED: attribute is optional, no default provided –"value": if not explicitly provided, this value inserted by default –#FIXED "value": as above, but only this value is allowed

10 10 Mapping DTDs to relational schemas Simplifying DTDs Creating and inlining DTD graphs Generating relational schemas

11 11 Simplifying DTDs A DTD might be very complex due to nesting, e.g., An XML query language is concerned about: –The parent-child relationships between XML elements –The relative order relationships between siblings (add an ordinal attribute to each relation)

12 12 DTD simplifications rules 1.e +  e * 2.e?  e 3.(e 1 | … | e n )  (e 1, …,e n ) 4.(a) (e 1,…,e n ) *  (e 1 *, …,e n * ) (b) e **  e * 5. (a) …, e, …, e, …  …,e *, …,… (b) …, e, …, e *, …  …,e *, …,… (c) …, e *, …, e, …  …,e *, …,… (d) …, e *, …, e *, …  …,e *, …,…

13 13 Example of simplifying a DTD simplified to

14 14 Creating and inlining DTD graphs We create a DTD graph based on the simplified DTD. Definition 3.2 (DTD graph) The structure of a DTD can be represented by a labeled graph, in which nodes represent elements and attributes, and edges represent their parent-child relationships. The edges are labeled by either `*' (star edge) or `, ' (normal edge) where the label `,' is not shown for simplicity. Idea: inline a child c to its parent p if p can contain at most one occurrence of c. Rationale: inlined elements will produce a relation.

15 15 Inlinable node and subtree, shared node Definition 3.3 (Inlinable node) Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge. Definition 3.4 (Inlinable subtree) Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree. This subtree is called the inlinable subtree for the node e (it is rooted at e). Definition 3.5 (Shared node) Given a DTD graph, a node is called a shared node if it has more than one incoming edge.

16 16 Inlining Case 1: Node a is connected to b by a normal edge and b has no other incoming edges, inlining b to a. Case 2: Node a is connected to b by a normal edge but b has other incoming edges, b is a shared node, no inlining. Case 3: Node a is connected to b by a star edge, no inlining.

17 17 Inlining (con’t)

18 18 Inlining DTD graphs

19 19 Complexity of inlining Theorem 3.7 (Time Complexity) The time complexity of our inlining algorithm is O(n) where n is the number of elements in the input DTD.

20 20 The inlining procedure

21 21 The inlining procedure (con’t) INCORRECT

22 22 The inlining procedure (con’t) CORRECT

23 23 Generating relational schema

24 24 Generating schema mapping info. Definition 3.8 (  Mapping)  is a mapping from X to R, where X is the set of XML element and attribute types in the input XML DTD, and R is the set of relations in the relational database. Given an XML element type e,  (e) will return the corresponding relation that is used to store e. Similarly, given an XML attribute type a of element type e,  (e.a) will return the corresponding relation that is used to store a of e.

25 25 A complete example

26 26 DTD graph Inlined DTD graph

27 27 Generated relational schema

28 28 Conclusions We defined the schema mapping algorithm ODTDMap, which has several improvements over the existing ones. It is lossless in the sense that one can reconstruct original XML document in the given document order, based on the target relational schema generated by ODTDMap. It has efficient support for recursive queries and schemas. It defines how to map set-valued XML attributes. Experimental results showed good performance and scalability of the algorithm.

29 29 Future work Extending our work to XML Schema to support data types other than string type. Maintain the ID/IDREF/IDREFS in terms of key and foreign key constraints.


Download ppt "XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,"

Similar presentations


Ads by Google