Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storing and Querying XML Documents Using Relational Databases Mustafa Atay Wayne State University Detroit, MI February 28, 2006.

Similar presentations


Presentation on theme: "Storing and Querying XML Documents Using Relational Databases Mustafa Atay Wayne State University Detroit, MI February 28, 2006."— Presentation transcript:

1 Storing and Querying XML Documents Using Relational Databases Mustafa Atay matay@wayne.edu Wayne State University Detroit, MI February 28, 2006

2 2/28/2006Wayne State University2 Outline of Talk What is XML? HTML vs. XML Problem Statement Schema-based Relational Approach Schema Mapping Data Mapping Query Mapping Reconstruction Conclusions

3 2/28/2006Wayne State University3 What is XML? eXtensible Markup Language primarily created by Jon Bosak of Sun Microsystems officially recommended by W3C (World Wide Web Consortium) since 1998 a simplified form of SGML (Standard Generalized Markup Language)

4 2/28/2006Wayne State University4 What is XML? (cont.) a meta language allows you to create and format your own document markups separates content from format a method for putting structured data into a text file; these files are easy to read unambiguous extensible platform-independent

5 2/28/2006Wayne State University5 HTML vs. XML First Value Second Value 87 99 45 67 86 84

6 2/28/2006Wayne State University6 HTML vs. XML (cont.) front door back door double hung1 double hung2 kitchen hallway double hung1 living_room

7 2/28/2006Wayne State University7 HTML vs. XML (cont.) HTML - uses tags and attributes - content and formatting can be placed together - tags and attributes are pre- determined and rigid - describes what a document looks like - doesn’t allow user to define content rules XML - uses tags and attributes - content and format are separate; formatting is contained in a stylesheet - allows user to create his/her own set of tags and attributes - describes the information in a document - allows user to define content rules (DTD)

8 2/28/2006Wayne State University8 Why Storing and Querying XML? XML has emerged as the standard for representing and exchanging data on the World Wide Web. The increasing amount of XML documents requires the need to store and query XML data efficiently.

9 2/28/2006Wayne State University9 A Sample XML Dataset European Bioinformatics Institute Databases ftp://ftp.ebi.ac.uk/pub/databases/interpro/ match.xml ~ 700MB

10 2/28/2006Wayne State University10 Approaches of Storing and Querying XML Documents using Native XML repositories Software AG’s Tamino eXcelon’s XIS using XML-enabled commercial database systems Oracle XML DB DB2 XML Extender Microsoft SQLXML using RDBMS/ORDBMS to store and query XML documents (Relational Approach)

11 2/28/2006Wayne State University11 Why to store XML in RDBMS? to get advantage of mature RDBMS technology in efficient storage, indexing and optimization techniques to enable companies or researchers to store and query XML data using their existing RDBMS system to enable processing of transformed XML data using both XML and relational queries from a middleware environment

12 2/28/2006Wayne State University12 Relational Approach XML-Publishing XPERANTO - Carey et al., WebDB’00 SilkRoute – M. Fernandez et al., WWW’00 Schema-less approach Edge – D. Florescu et al., IEEE DEB’99 STORED – A. Deutsch et al., SIGMOD’99 Schema-based approach Basic, Shared and Hybrid inlining – J. Shanmugasundaram et al., VLDB’99 ODTDMap – M. Atay et al., IS’06

13 2/28/2006Wayne State University13 Schema-based Relational Approach Schema Mapping XML data model is mapped into the relational model Data Mapping XML documents are shredded and composed into tuples to be inserted into the relational database Query Mapping XML queries are translated into SQL queries Reverse Data Mapping (Reconstruction) Original XML document is recovered from the RDBMS

14 2/28/2006Wayne State University14 Schema Mapping Schema mapping algorithm ODTDMap contains the following steps: Simplifying DTDs Creating and inlining DTD graphs Generating relational schema and the schema mapping file.

15 2/28/2006Wayne State University15 Sample DTD – univ.dtd <!DOCTYPE univ [ ]>

16 2/28/2006Wayne State University16 Creating DTD Graph <!DOCTYPE univ [ ]>

17 2/28/2006Wayne State University17 Inlining DTD Graph

18 2/28/2006Wayne State University18 Generating Relational Schema

19 2/28/2006Wayne State University19 Data Mapping Challenging issues of data mapping Should respect to schema mapping Varying document structure Scalability We introduced two efficient linear algorithms OXInsert main memory data mapping algorithm DOM-based SDM streaming data mapping algorithm SAX-based

20 2/28/2006Wayne State University20 Sample XML document - univ.xml www.cs.wayne.edu 313-5773920

21 2/28/2006Wayne State University21 XMLTree for univ.xml 1 3 4 6 8 www.cs.wayne.edu 9 11 13 313-5773920 14 16

22 2/28/2006Wayne State University22 XMLTree for univ.xml 1 3 4 6 8 www.cs.wayne.edu 9 11 13 313-5773920 14 15

23 2/28/2006Wayne State University23 Database state after univ.xml is mapped

24 2/28/2006Wayne State University24 Performance of OXInsert and SDM

25 2/28/2006Wayne State University25 Data Mapping Across Different Schema Mappings

26 2/28/2006Wayne State University26 Query Mapping We translate simple XPath expressions to SQL XPath is the core of XML query languages. We identified 3 algorithms for query mapping Naïve Cluster Containment Join

27 2/28/2006Wayne State University27 Naïve Takes an XPath expression creates a nested SQL query comprised of SQL queries for each XPath step e.g. XPath: /univ /colleges /college /dep[@dName=‘CS’] SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in (Select college.ID from college where college.parentID in (Select colleges.ID from univ where colleges.parentID in (Select univ.ID from univ where univ.parentID=0) ) )

28 2/28/2006Wayne State University28 Cluster A cluster is a sequence of consecutive elements stored in the same table Takes an XPath expression and creates a nested SQL query comprised of SQL queries for each XPath cluster e.g. XPath: /univ /colleges /college /dep[@dName=‘CS’] SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in (Select college.ID from college where college.parentID in (Select colleges.ID from univ) )

29 2/28/2006Wayne State University29 Containment Join Relies on the well-formedness of XML documents Requires the pre-computation of max. ID of descendants of each element instance (endID) Facilitates efficient evaluation of recursive XML queries e.g. XPath: /univ /colleges /college /dep[@dName=‘CS’] SQL: Select dep.ID from dep, college, univ where dep.dName=‘CS’ and dep.ID>=college.ID and dep.ID<=college.endID and college.parentID=univ.colleges.ID

30 2/28/2006Wayne State University30 A Recursive Query Example XPath: /univ //dep Sub queries of the recursive query /univ /colleges /college /dep /univ /schools /school /dep Naïve: 8 SQL queries + 6 joins + 1 union Cluster: 6 SQL queries + 4 joins + 1 union Containment Join: 1 SQL query + 1 join Select dep.ID from dep, univ where dep.ID>=univ.ID and dep.ID<=univ.endID

31 2/28/2006Wayne State University31 Reconstruction In query mapping stage, the elements selected by an XML query can be returned in one of the following two modes: Select mode: returns IDs Reconstruct mode: returns XML subtrees Algorithm Reconstruct reconstructs the XML subtree rooted at a given element The importance of Reconstruction lies in two aspects: XML subtree reconstruction has great impact on the query response time in reconstruct mode. It demonstrates that our mapping scheme is lossless

32 2/28/2006Wayne State University32 Conclusions Schema mapping [1,5] lossless and order preserving processing set-valued XML attributes simple processing of recursion Data mapping [1,3] We described the first linear-time schema- based data mapping algorithms We justified their effectiveness on different schema mapping algorithms

33 2/28/2006Wayne State University33 Conclusions (cont.) Query mapping We identified 3 algorithms Our CJ algorithm outperforms the only published recursive query mapping algorithm by Krishnamurthy et al., IEEE ICDE’04 Reconstruction [2] We introduced an efficient reconstruction algorithm It can be used in relational schema-based mapping unlike its rivals used in XML-publishing

34 2/28/2006Wayne State University34 Future Work Extending the schema mapping to XML Schema Extending the query mapping to XQuery Introducing DTD/Schema constraints to the proposed mapping scheme Incorporating access control methods to the proposed mapping scheme

35 2/28/2006Wayne State University35 Acknowledgements Dr. Shiyong Lu Dr. Farshad Fotouhi Artem Chebotko Dapeng Liu Yezhou Sun

36 2/28/2006Wayne State University36 Publications 1.Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, Farshad Fotouhi, "Efficient Schema- based XML-to-Relational Data Mapping", International Journal of Information Systems, 2006. (to appear) 2.Artem Chebotko, Dapeng Liu, Mustafa Atay, Shiyong Lu and Farshad Fotouhi, “Reconstructing XML Subtrees from Relational Storage of XML Documents”, in Proc. of the 2nd International Workshop on XML Schema and Data Management (XSDM’05), in conjunction with ICDE’2005, Tokyo, Japan, April, 2005 3.Mustafa Atay, Yezhou Sun, Dapeng Liu, Shiyong Lu and Farshad Fotouhi, “Mapping XML Data to Relational Data: DOM-based Approach”, in Proc. of the 8th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA’2004). Kauai, Hawaii, USA. August, 2004. 4.Shiyong Lu, Yezhou Sun, Mustafa Atay, and Farshad Fotouhi, “On the consistency of XML DTDs”, International Journal of Data and Knowledge Engineering, 2004. 5.Shiyong Lu, Yezhou. Sun, Mustafa Atay, and Farshad Fotouhi, “A new inlining algorithm for mapping XML DTDs to relational schemas”, In Proc. of the First International Workshop on XML Schema and Data Management, in conjuction with the 22nd ACM International Conference on Conceptual Modeling (ER2003), Chicago, Illinois, USA, October 2003. 6.Shiyong Lu, Yezhou Sun, Mustafa Atay, Farshad Fotouhi, "A Sufficient and Necessary Condition for the Consistency of XML DTDs", in Proc. of the First International Workshop on XML Schema and Data Management, in conjunction with the 22nd ACM International Conference on Conceptual Modeling (ER'2003), Chicago, Illinois, USA, October, 2003.


Download ppt "Storing and Querying XML Documents Using Relational Databases Mustafa Atay Wayne State University Detroit, MI February 28, 2006."

Similar presentations


Ads by Google