Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nov 2, 2001The Storage and Benchmarking of XML1 Presenter: Kevin See IBM Toronto Lab. DB2 SQL /Catalog Development Date: Nov 2, 2001.

Similar presentations


Presentation on theme: "Nov 2, 2001The Storage and Benchmarking of XML1 Presenter: Kevin See IBM Toronto Lab. DB2 SQL /Catalog Development Date: Nov 2, 2001."— Presentation transcript:

1 Nov 2, 2001The Storage and Benchmarking of XML1 Presenter: Kevin See IBM Toronto Lab. DB2 SQL /Catalog Development Date: Nov 2, 2001

2 Nov 2, 2001The Storage and Benchmarking of XML2 Outline Introduction Text file / OODBMS/ native DB approach Relational DB approaches Categories 2 latest proposals XML benchmarks Conclusion

3 Nov 2, 2001The Storage and Benchmarking of XML3 Introduction XML is emerging to become the standard for data exchanging Demand for storage and management of the XML documents is growing There are a few ways to manage the XML document

4 Nov 2, 2001The Storage and Benchmarking of XML4 Text Approach File system A separate query engine will need to be implemented Parsing  not possible Index strategies : (parent_offset, tag), (child_offset, parent_offset), (tagname, value), (attribute_name, attribute_value)  not good for update

5 Nov 2, 2001The Storage and Benchmarking of XML5 Object-oriented Database Management System Michael R. Olson and Byung S. Lee (1997) OO model fit well Immature technology: Hard to scale Conclude the experiment without any great success.

6 Nov 2, 2001The Storage and Benchmarking of XML6 Native Database Approach Prototypes: Lore (Stanford University), Xyleme (INRIA, France). Immature technology No optimization capabilities

7 Nov 2, 2001The Storage and Benchmarking of XML7 Relational Database Technology Very mature technology Query optimization techniques and the processing mechanisms in relational databases have been studied for a quarter of a century A very large percentage of the data are currently stored in RDMS

8 Nov 2, 2001The Storage and Benchmarking of XML8 Storage and Retrieval of XML Using Relational Database XML Table1 Relational to XML Conversion XML to Relational Mapping XML Query to SQL SQL XML Query Language such as XPath, XQuery, Quilt, XQL

9 Nov 2, 2001The Storage and Benchmarking of XML9 Classifications of Various Mapping Methods Structure-mapping approach The XML document’s logical structures (or DTDs if available) are represented by the database schemas 1 DTD : 1 set of generated schemas Model-mapping approach Constructs of XML model are represented by the database schemas 1 set of generated schemas for all/any DTD

10 Nov 2, 2001The Storage and Benchmarking of XML10 Relational Schema Prototype Tree Mapping Method M. Yan and A. The Chinese University of Hong Kong (2001) Structure-mapping Global Schema Extraction Algorithm DTD Splitting Schema Extraction Algorithm

11 Nov 2, 2001The Storage and Benchmarking of XML11 Relational Schema Prototype Tree Mapping Method (Cont’d) Relational Databases for Querying XML Documents: Limitations and Opportunities (J. Shanmugasundaram) Basic steps: 1.Simplify DTD 2.Construct schema prototype tree 3.Generate relational schema prototypes 4.Detect functional dependencies and candidate keys 5.Normalize the relational schema prototypes

12 Nov 2, 2001The Storage and Benchmarking of XML12 DTD Splitting Schema Extraction Algorithm Step 1: Simplify DTD p|p'  p, p' p+  p* (p, p')  p, p'..., p,..., p*,...  p* p?  p (p, p’)*  p*, p’*..., p,..., p,...  p*

13 Nov 2, 2001The Storage and Benchmarking of XML13 An Book DTD

14 Nov 2, 2001The Storage and Benchmarking of XML14 Transformed/ Simplified DTD

15 Nov 2, 2001The Storage and Benchmarking of XML15 Step 2: Construct Schema Prototypes Trees 1. Only an element can become a root 2. An element that is not nested inside other elements can become the root 3. A non-#PCDATA element that is nested in more than 1 other element becomes the root 4. If a non-#PCDATA element B is not the only subelement of A and B only appears in A with a “*”, it becomes the root 5. One of the elements in the recursion is selected as root should recursion occurs in the DTD

16 Nov 2, 2001The Storage and Benchmarking of XML16 Roots for the Example DTD Element book is selected as root – rule 2 Element author is selected as root – rule 3 Element authority is selected as root – rule 4 Element monograph is selected as root – rule 5

17 Nov 2, 2001The Storage and Benchmarking of XML17 Step 2: Construct Schema Prototypes Trees (Cont’d) Tree construction: Depth-first scan on DTD for all selected root(s) starting from the subelements of the root New nodes for each visited elements and attributes A mixed element (element containing both #PCDATA and other subelement) will be marked with a “#” in the tree Recursion – a new leaf node with label.A

18 Nov 2, 2001The Storage and Benchmarking of XML18 Schema Prototype Trees

19 Nov 2, 2001The Storage and Benchmarking of XML19 Step 3: Generate Relational Schema Prototype All necessary descendants are inlined starting from the root except key nodes or foreign key nodes.

20 Nov 2, 2001The Storage and Benchmarking of XML20 Relational Schema Prototype Book (booktitle, price) Authority (country, authname) Author (address, id, firstname, lastname) Monograph (title, name)

21 Nov 2, 2001The Storage and Benchmarking of XML21 Step 4: Discover FDs and Candidate Keys Functional dependencies (FDs) and the candidate keys discovery by analyzing the XML data TANE algorithm (http://www.cs.helsinki.fi/research/fdk/ datamining/tane/)

22 Nov 2, 2001The Storage and Benchmarking of XML22 Candidate Keys Book {booktitle} Authority {country, authname} Monograph {title} Author {id}, {lastname, address}

23 Nov 2, 2001The Storage and Benchmarking of XML23 Relational Schema Prototype With Candidate Keys Book (booktitle, price, author.id) Authority (country, authname, assigned, book.booktitle) Author (address, id, firstname, lastname) Monograph (title, name, author.id, monograph.title) Book (booktitle, price) Authority (country, authname) Author (address, id, firstname, lastname) Monograph (title, name)

24 Nov 2, 2001The Storage and Benchmarking of XML24 Step 5: Normalize the Relational Schema Prototypes The last step. Normalize the schema to 3NF (third normal form) if possible. Structure mapping methods does not handle order but leave it to metadata or user to handle.

25 Nov 2, 2001The Storage and Benchmarking of XML25 X-Rel Masatoshi Yoshikawa, Toshiyuki Amagasa, Takeyuki Shimura and Shunsuke Nara Institute of Science and Technology, Japan (2001) Model-mapping Data model: XPath (root node, element nodes, attribute nodes, and text nodes) The concept of region

26 Nov 2, 2001The Storage and Benchmarking of XML26 Definition of Region The region of: An element node or a text node is a pair of numbers representing the start and end positions of the node in the XML document An attribute node is a pair of identical numbers equal to the start position of the parent element node plus one

27 Nov 2, 2001The Storage and Benchmarking of XML27 Simple Path Expressions Path – an unit of decomposition of XML trees Store simple path expression (denoted by SimplePathExpr) from the root node. Why? Path is appear in XML queries frequently

28 Nov 2, 2001The Storage and Benchmarking of XML28 Why “#” Is Added? Look for family descendants of issue. 1. WHERE p1.pathexp LIKE ‘/issue%/family’ 2. WHERE p1.pathexp LIKE ‘#/issue#%/family’ /issuelist/family (WRONG) is match for the first but not the second.

29 Nov 2, 2001The Storage and Benchmarking of XML29 Mei Zhou Open Text Corporation Frank Tompa University of Waterloo Example XML Document

30 Nov 2, 2001The Storage and Benchmarking of XML30 XML Tree

31 Nov 2, 2001The Storage and Benchmarking of XML31 Simple Path Expressions /Regions Node 3 (1,1) Node 9 #/Paper#/Authors#/ affiliation (99, 145)

32 Nov 2, 2001The Storage and Benchmarking of XML32 Mapping Idea A relational table per node type Simple path expression are normalized docID is introduced Basic XRel schema  Element (docID, pathID, start, end, index, reindex) Attribute (docID, pathID, start, end, value) Text (docID, pathID, start, end, value) Path (pathID, pathexp)

33 Nov 2, 2001The Storage and Benchmarking of XML33 Table - Element docIDpathIDstartendindex reindex

34 Nov 2, 2001The Storage and Benchmarking of XML34 Table - Attribute docIDpathIDstartendvalue 1211The Suffix-Signature Method for Searching Phrases in Text

35 Nov 2, 2001The Storage and Benchmarking of XML35 Table - Text docIDpath ID startendvalue Mei Zhou Open Text Corporation Frank Tompa University of Waterloo

36 Nov 2, 2001The Storage and Benchmarking of XML36 Table - Path pathIDpathexpr 1#/Paper 3#/Paper#/Authors 4#/Paper#/Authors#/FN 5#/Paper#/Authors#/LN 6#/Paper#/Authors#/Affiliation

37 Nov 2, 2001The Storage and Benchmarking of XML37 XML Benchmarking Desiderata Bulk loading Reconstruction Path traversals Casting Missing elements

38 Nov 2, 2001The Storage and Benchmarking of XML38 XML Benchmarking Desiderata (continued) Ordered access References Joins Construction of large results Containment, full-text search

39 Nov 2, 2001The Storage and Benchmarking of XML39 XML Benchmarks 3 XML benchmark proposals. XMach-1. University of Leipzig, Germany. XML benchmark project. CWI, the Netherlands. Kanda et al. Proposal (unpublished). University of Michigan, IBM Toronto lab center for advanced studies.

40 Nov 2, 2001The Storage and Benchmarking of XML40 Conclusion From the different mapping approaches and experiments, there are a few places where relational database enhancement can help in coping with XML model differences. Support for sets. Flexible comparisons operators. Multi-predicate merge join.

41 Nov 2, 2001The Storage and Benchmarking of XML41 Questions & Answers

42 Nov 2, 2001The Storage and Benchmarking of XML42 Appendix A Enhancing Structural Mappings Based on Statistics

43 Nov 2, 2001The Storage and Benchmarking of XML43 Optimal Hybrid Database Algorithm M. Klettke, and H. Meyer XML and object-relational database systems - enhancing structural mappings based on statistics (2000) An algorithm that finds a type of optimal mapping based on the statistics and the DTD

44 Nov 2, 2001The Storage and Benchmarking of XML44 Optimal Hybrid Database Algorithm 1.Build a graph representing the hierarchy of the elements and attributes of the DTD. 2.For every element/attribute of the graph, a measure of significance, w, is determined. 3.Derive the resulting database design from the graph.

45 Nov 2, 2001The Storage and Benchmarking of XML45

46 Nov 2, 2001The Storage and Benchmarking of XML46 Graph for an Example DTD

47 Nov 2, 2001The Storage and Benchmarking of XML47 Calculate the Weight (Step 2) W = 1/6 (S Q + S A + S H ) + ¼ (D A /D G ) + ¼ (Q A /Q G ) where S Q - exploitation of quantifiers S A - exploitation of alternatives S H - position in the hierarchy D A - number of documents containing the element/attribute D G - absolute number of XML documents Q A - number of queries containing the element/attribute Q G - absolute number of queries

48 Nov 2, 2001The Storage and Benchmarking of XML48 The Graph With the Colored Weight

49 Nov 2, 2001The Storage and Benchmarking of XML49 Step 3 - Deriving Hybrid Databases From the Graph First, specify a limit on which attributes and/or elements is represented as attributes of the databases and which attributes and/or element are represented as XML attributes

50 Nov 2, 2001The Storage and Benchmarking of XML50 Step 3 - Deriving Hybrid Databases From the Graph Then, search for all nodes of the graph that satisfy the following conditions:  The node is not a leaf of the graph  The node and all its descendants are below the limit given  No predecessor that satisfies the first two conditions exists.

51 Nov 2, 2001The Storage and Benchmarking of XML51 Step 3 - Deriving Hybrid Databases From the Graph The selected nodes and its descendents (the whole sub-graph) will be replaced by an XML attribute. (A BLOB like attribute) All other elements and attributes will be mapped to relational database using mapping.

52 Nov 2, 2001The Storage and Benchmarking of XML52 Resulting XML Attributes for the Example DTD

53 Nov 2, 2001The Storage and Benchmarking of XML53 References nara.ac.jp/members/Yoshikawa/paper/TOIT20 01.pdf df =papers


Download ppt "Nov 2, 2001The Storage and Benchmarking of XML1 Presenter: Kevin See IBM Toronto Lab. DB2 SQL /Catalog Development Date: Nov 2, 2001."

Similar presentations


Ads by Google