Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.

Similar presentations


Presentation on theme: "XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions."— Presentation transcript:

1 XML and Databases 198:541

2 XML Motivation

3  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions (presentation)  Integration of data from different sources  Structural differences  Closely related to semistructured data

4 Semistructured Data  Integration of heterogeneous sources  Data sources with non rigid structures  Biological data  Web data  Need for more structural information than plain text, but less constraints on structure than in relational data

5 Characteristics of Semistructured Data  Missing or additional tuples  Multiple attributes  Different types in different objects  Heterogeneous collection  Self-describing, irregular data with no apriori structure

6 HTML Document Example Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Type of information Title Authors Year book

7 The Idea Behind XML  Easily support information exchange between applications / computers  Reuse what worked in HTML  Human readable  Standard  Easy to generate and read  But allow arbitrary markup  Uniform language for semistructured data  Data Management

8 XML

9  eXtensible Markup Language  Universal standard for documents and data  Defined by W3C  Set of emerging technologies  XLink, XPointer, XSchema, DOM, SAX, XPath, XQuery,…

10 XML  XML gives a syntax, not a semantic  XML defines the structure of a document, not how it is processed  Separate structural information from format instructions

11 XML Example Foundations… Abiteboul Hull Vianu Addison Wesley 1995 …

12 XML Terminology  Tags: book, title, author,…  Start tag:  End Tag:  Elements are nested  Empty Element  =>  XML Document: single root element  XML Document is well formed: matching tags

13 XML Attributes  Attributes are pairs that characterize an element. Foundations of Databases Abiteboul … 1995  Can define oid, but they are just syntax

14 More XML  Text can be CDATA or PCDATA  Entity References: &amp:&, &gt:>,…  Processing Instructions:  Comments:

15 Well Formed XML Documents  Elements must be properly nested  Foundations of Databases  But Not:  Foundations of Databases  There must be a unique root element  Elements can be of  ‘element content’  or ‘mixed content’:  This is Mixed Content

16 XML: Potential  Flexible enough to represent anything  Stock market, DNA, Music, Chemicals  Weather information  Wireless network configuration  Enables easy information exchange  Between companies  Within companies  Standard: everybody uses the same technology

17 XML: Limitations  XML is only a syntax for documents  We need tools!  Editors and parsers  Programming APIs (for Java, C++, etc.)  Languages to manipulate XML (how many books?)  Schemas (What is a book like?)  Storage (What if you have a lot of XML?)  Transfer protocols (How do you exchange it?)  What about XML in Chinese…?  How can XML fit into my phone…?  Query processing?  …

18 XML Schema Language

19 DTDs: Document Type Descriptors  Similar to a schema  Grammar describing constraints on document structure and content  XML Documents can be validated against a DTD

20 Shortcomings of DTDs  Useful for documents, but not so good for data:  No support for structural re-use  Object-oriented-like structures aren’t supported  No support for data types  Can’t do data validation  Can have a single key item (ID), but:  No support for multi-attribute keys  No support for foreign keys (references to other keys)  No constraints on IDREFs (reference only a Section)

21 XSchema  In XML format  Includes primitive data types (integers, strings, dates,…)  Supports value-based constraints (integers > 100)  Inheritance  Foreign keys  …

22 Example of XSchema …

23 XML Storage

24 Storing XML Data  Different approaches:  Storing as text  Using RDBMS  Using a native system Tailored for XML, (NATIX, Tamino, Ipedo, etc.) Performance of the various approaches depends on your application

25 Storing XML as Text  Simple  Easy to compress  No updates  Need to parse the document every time it is needed

26 Storing XML in RDBMS  Uses existing RDBMS techniques  Costly in space, takes time to reconstruct original document  Example techniques:  Schema with 2 relations: tag and value  Schema with n relations: 1 per element name

27 Accessing and Querying XML Data

28 XML as a Tree: DOM  DOM = Document Object Model  Class hierarchy serving as an API to XML trees  Methods of those classes can be used to manipulate XML (e.g., Node::child, Node::name)  Can be used from Java, C++ to develop XML applications.  Each node has an identity (i.e., a unique identifier) in the whole document

29 XML as a DOM Tree  Class hierarchy(node, element attribute) bibliography book titleauthorpublisheryear book author Foundations of Databases AbiteboulHullVianuAddison Wesley 1995

30 XML as a Stream: SAX  XML document = event stream. E.g.,  Opening tag ‘book’  Opening tag ‘title’  Text “Foundations of databases”  Closing tag ‘title’  Opening tag ‘author’  Etc. SAX allow you to associate actions with those events to build applications Very efficient since it corresponds to events during parsing, but not always sufficient.

31 XPath  Language for navigating in an XML document (seen as a tree)  One root node  types of nodes: root, element, text, attribute, comment,…  XPath expression defines navigation in the tree following axis: child, descendant, parent, ancestor,…

32 XPath: Examples  Find all the titles of all the books:  //book/title  Find the title of all books written by Charles Dickens  //book[author=“Charles Dickens”]/title  Find the title of the first section in the second chapter in “Great Expectations”  //book[title=“Great Expectations”]/chapter[2]/section[1]/title  Find the title of all sections that come after the second chapter in “Great Expectations”:  //book[title=“Great Expectations”]/chapter[2]/following::section/title

33 Querying XML Data  Need for a language to query XML data  Should yield XML output  Should support standard query operations  No schema required  Several work on an XML query language: XML-QL, XQuery,..

34 XQuery  XPath included in XQuery  FLWR expressions: for let where return FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title Result: abc def ghi

35 How to process XML Queries? Use indexes Need to identify nodes Need to know relations between nodes Labeling Schemes Dewey encoding Prefix-Postfix encoding Twigstack

36 Web Services

37 What are Web Services  Programming interfaces for application to application communication on the Web platform-independent, language-independent object model-independent  Possibility to activate methods on remote web servers (RPC)  2 main applications E-commerce Access to remote data

38 XML and Web Services  Exchange of information between application is in XML  Input and Result  Use of SOAP to generate messages  Descriptions of the web service functionality given in XML, according to the WSDL schema Web Services standards use XML heavily

39 Conclusions  XML: a very active area Many research directions Many applications  Standards not finalized yet: XQuery XML Schema Web Services…

40 Some Important XML Standards  XSL/XSLT: presentation and transformation standards  RDF: resource description framework (meta- info such as ratings, categorizations, etc.)  XPath/XPointer/XLink: standard for linking to documents and elements within  Namespaces: for resolving name clashes  DOM: Document Object Model for manipulating XML documents  SAX: Simple API for XML parsing  …

41 References  XML  http://www.w3.org/XML/ http://www.w3.org/XML/  Sudarshan S. Chawathe: Describing and Manipulating XML Data. IEEE Data Engineering Bulletin 22(3)(1999)  XML Standards  http://www.w3.org/ (XSL, XPath, XSchema, DOM…) http://www.w3.org/  Storing XML Data  Daniela Florescu, Donald Kossmann: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3)(1999)  Hartmut Liefke, Dan Suciu: XMILL: An Efficient Compressor for XML Data. SIGMOD Conference 2000  XQuery  http://www.w3.org/TR/xquery/ http://www.w3.org/TR/xquery/  Peter Fankhauser: XQuery Formal Semantics: State and Challenges. SIGMOD Record 30(3)(2001)  Web Services  http://www.w3.org/2002/ws/


Download ppt "XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions."

Similar presentations


Ads by Google