Presentation is loading. Please wait.

Presentation is loading. Please wait.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Storage Techniques.

Similar presentations


Presentation on theme: "VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Storage Techniques."— Presentation transcript:

1 VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Storage Techniques Lecturer : Pavle Mogin

2 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 1 Plan for the XML Storage Topic Data centric and document centric XML documents Different ways to store XML documents: –Text files, –BLOBs, –Object-Relational databases, and –Native XML databases Roundtripping Reading: –Ramakrishnan, Gherke: Database Management Systems, Chapter 27, Section 27.8 –Ronald Bouret: XML and Databases http://www.rpbourret.com/xml/

3 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 2 Data Centric and Document Centric XML Data with partial structure is called semistructured XML documents are considered to be semistructured XML documents are often classified as: –Either data centric, or –Document centric This classification plays an important role in deciding what kind of a database system to use The border between data and document centric XML documents is sometimes blurred

4 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 3 Data Centric XML Documents Data centric documents: –Use XML as a data transport medium, –Are designed for machine consumption, –Have fairly regular structure, –Little or no mixed element and character content, and –Order of elements must not be of a great significance –Examples: Sales orders, Flight schedules, Stock quotes, Student class data

5 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 4 A Data Centric XML Document Pavle Ahmed ElDabagh Craig Anslow

6 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 5 Document Centric XML Documents Document centric XML documents: –Are usually built for human consumption, –Have a less regular structure, –Subordinated elements and character data within the content of a complex element are usually interspersed, and –The order in which subordinated elements and character data occur is (by the rule) significant –Examples: Advertisements, Product descriptions, Emails, Manuals Generally, documents with mixed contents

7 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 6 A Mixed Content XML Document Pavle The student with the James Bond is the best student in the class. He scored 40.0 points out of 40.0. His presentation of the XML Functional Dependencies was brilliant. …

8 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 7 Why Is the Order So Important Pavle The student with the 40.0 is the best student in theclass. He scored James Bond points out of XML Functional Dependencies. His presentation of the 40.0 was brilliant. …

9 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 8 Different Ways to Store XML Documents Different ways to store collections of XML documents are: –Text files, –Object-Relational databases using BLOBs or CLOBs, –Native XML databases: Text-Based, Model-Based –XML Enabled Object-Relational DBMSs that use an XML to object - relational mapping, Each of these storage methods uses some kind of a mapping

10 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 9 Storing XML in Text Files The easiest way to store a small set of simple XML documents One way to implement a Text file XML database is on top of the Unix file system Usually, each XML document is stored as a separate file, so that each is accessible from the directory as a whole The Unix file system provides: –Text editors, –Allows accessing document from XML parsers, and –Full Text searches but efficient querying and modifying of a document is not provided –You may use grep and sed commands

11 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 10 Storing Documents in BLOBs Storing XML documents as BLOBs (or CLOBs) in relational databases offers: –Transaction control, –Security, –Multi-user access, and –Various Text searches as: Full text search, Proximity searches, Synonym searches But: –Retrieving is mainly restricted to whole documents and –Modifying is done by deleting an existing and inserting a new document

12 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 11 Using BLOBs To store Class documents as BLOBs: CREATE TABLE Class ( ClassId int PRIMARY KEY, Class_Doc longvarchar); Make an index to provide for fast access to certain individual documents: CREATE TABLE Lecturer ( LectId int NOT NULL, Name char(20) NOT NULL, ClassId int NOT NULL REFERENCES Class, PRIMARY KEY (LecturerId, ClassId));

13 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 12 Using BLOBs (continued) Suppose now documents are stored in a database using a (JDBC) application When a new Class document is stored in the database, the application scans the document for elements, and stores lecturers’ names and ClassId value in the Lecturer table Another application can retrieve the documents by a simple statement SELECT Class_Doc FROM CLASS WHERE ClassId IN (SELECT ClassId FROM Lecturer WHERE Name = ‘Pavle’);

14 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 13 XML Enabled Databases All major RDBMS vendors like: –IBM, –Oracle, –Microsoft, and –Sybase offer XML extensions for their general purpose database engines These extensions perform: –XML to relational, and –Relational to XML mappings

15 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 14 XML To Object - Relational Mapping The most popular ways of storing XML data centric documents in relational databases as relational tables and publishing relational data as XML documents are based on: –XML to Object-Relational mapping (Sharded – Inlining method, Structure-Based method), and –Object-Relational to XML mapping (Inclusion Dependency Based Mapping), Often these mappings do not care about: –Document order, –Comments, –Processing instructions, –CDATA sections since data centric documents are mainly foreseen for machine consumption We shall devote separate lectures to these methods

16 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 15 Native XML Databases Native XML databases are designed specifically to store XML documents: –They are based on an XML data model, and –They support (or are supposed to support) majority of features that other databases do Native XML databases are mostly useful for storing document centric XML documents, since they: –Preserve document order, –Preserve all information that XML enabled databases drop, –Allow using XML query languages, –Speed up retrieving whole documents, –Allow storing XML documents without a DTD or Schema

17 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 16 Definition Of a Native XML Database 1.Defines a (logical) model for an XML document, whose concepts are (at least): –Elements, –Attributes, –PCDATA, and –Document order 2.Has an XML document as its fundamental unit of (logical) storage 3.Is not required to have any particular underlying physical storage model (but one certainly needs to have)

18 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 17 Features of Native XML Databases Document Collections (sets of documents), Query Languages (XPath, XQuery), Updates and Deletes (XQuery Update), Transactions, Locking, and Concurrency (Granularity of locking - whole documents), Application Programming Interface (JDBC), Round – Tripping, and Indexing

19 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 18 Round - Tripping XML round – tripping is ability to store a document and get the same document back again Some storage techniques like: –Text files, –Object-Relational BLOBS provide for very high fidelity of round tripping (100% or so) Storage techniques based on non trivial mapping provide for round tripping to some extent The fidelity depends on the mapping model

20 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 19 Levels of Round - Tripping Native XML databases round trip documents at least at the level of: –elements, attributes, PCDATA, document order, –but often do more (CDATA section, processing instructions, comments, entity references) XML enabled databases, by the rule, do not even distinguish between elements and attributes, and neglect: –CDATA sections, –Processing instructions, –Entity references, and –Comments, So, there is a spread scale of round tripping

21 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 20 Round – Tripping Conclusion Round tripping is important for document centric XML applications, because they need: –CDATA sections, –Comments, –Entity references, –Exact order of interspersed text and elements, –Processing instructions It is less important for data centric applications, since they usually care for data, and data are contained in elements, attributes and #PCDATA, only

22 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 21 Architectures of Native XML Databases The architectures of native XML databases fall into two broad categories: –Text-Based, and –Model-Based Text-Based native XML databases store a document as a unit Model-Based native XML databases use an XML model like DOM to represent a document tree structure and then map objects of this representation to a database (usually an object-relational one)

23 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 22 Text-Based Native XML Databases A text-based native XML database is one that stores XML as text in: –Text files, –Relational BLOBs with XML processing ability, or –Proprietary storage format (like eXist) All text-based XML databases pay a special attention to indexing This way is retrieval of whole documents in their hierarchical order, or their fragments made very effective Data retrieval in an inverted form may encounter performance problems, unless a really versatile indexing is provided

24 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 23 SQL/XML Recently, SQL Standard Committee has issued a new extension to SQL:1999 called SQL/XML It considers: –Publishing SQL data as XML documents –Storing XML documents as values of table columns of the XML type, Each XML document is a value of the XML type –Querying XML type data within a SQL database using XQuery –Converting XML data type data into SQL data type data

25 SWEN 432 Advanced Database Design and Implementation 2015 XML Storage 24 Summary Collections of XML documents may be stored using: –Text files, –Relational BLOBs, –Relational tables (after DTD or XML Schema to relational mapping), and –Native XML databases Text files, relational BLOBs as text-based native XML databases are appropriate for document centric XML documents Physically, a native XML database stores a faithful XML model in a database that may be relational


Download ppt "VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Storage Techniques."

Similar presentations


Ads by Google