Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Facultad de Informática. Universidad Politécnica de Madrid

Similar presentations


Presentation on theme: "XML Facultad de Informática. Universidad Politécnica de Madrid"— Presentation transcript:

1 XML Facultad de Informática. Universidad Politécnica de Madrid
Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid

2 Sources The slides are taken from:
Database System Concepts©Silberschatz, Korth and Sudarshan. See for conditions on re-use XML tutorial at (the ones with no footnote)

3 AGENDA Introduction Structure of XML Data XML Document Schema
Querying and Transformation Application Program Interfaces to XML Storage of XML Data XML Applications

4 Introduction

5 What is XML? XML stands for EXtensible Markup Language
XML is a markup language much like HTML XML was designed to describe data XML tags are not predefined. You must define your own tags Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML Documents have tags giving extra information about sections of the document E.g. <title> XML </title> <slide> Introduction …</slide> XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML with a DTD or XML Schema is designed to be self-descriptive

6 The Main Difference Between XML and HTML
XML was designed to carry data. XML is not a replacement for HTML. XML and HTML were designed with different goals: XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML is about displaying information, while XML is about describing information. XML is a Complement to HTML It is important to understand that XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. XML is a cross-platform, software and hardware independent tool for transmitting information.

7 How can XML be used It is important to understand that XML was designed to store, carry, and exchange data. XML was not designed to display data. XML can Separate Data from HTML With XML, your data is stored outside your HTML. When HTML is used to display data, the data is stored inside your HTML. With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for data layout and display, and be sure that changes in the underlying data will not require any changes to your HTML. XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on using HTML only for formatting and displaying the data.

8 Why using XML XML is Used to Exchange Data XML and B2B
With XML, data can be exchanged between incompatible systems. XML and B2B With XML, financial information can be exchanged over the Internet. XML Can be Used to Share Data With XML, plain text files can be used to share data. XML Can be Used to Store Data With XML, plain text files can be used to store data. XML Can Make your Data More Useful With XML, your data is available to more users. XML Can be Used to Create New Languages XML is the mother of WAP and WML. If Developers Have Sense If they DO have sense, all future applications will exchange their data in XML. XML is Used to Exchange Data With XML, data can be exchanged between incompatible systems. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications. XML and B2B With XML, financial information can be exchanged over the Internet. Expect to see a lot about XML and B2B (Business To Business) in the near future. XML is going to be the main language for exchanging financial information between businesses over the Internet. A lot of interesting B2B applications are under development. XML Can be Used to Share Data With XML, plain text files can be used to share data. Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data. This makes it much easier to create data that different applications can work with. It also makes it easier to expand or upgrade a system to new operating systems, servers, applications, and new browsers.  XML Can be Used to Store Data With XML, plain text files can be used to store data. XML can also be used to store data in files or in databases. Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data. XML Can Make your Data More Useful With XML, your data is available to more users. Since XML is independent of hardware, software and application, you can make your data available to other than only standard HTML browsers. Other clients and applications can access your XML files as data sources, like they are accessing databases. Your data can be made available to all kinds of "reading machines" (agents), and it is easier to make your data available for blind people, or people with other disabilities. XML Can be Used to Create New Languages XML is the mother of WAP and WML. The Wireless Markup Language (WML), used to markup Internet applications for handheld devices like mobile phones, is written in XML. You can read more about WML in our WML tutorial. If Developers Have Sense If they DO have sense, all future applications will exchange their data in XML. The future might give us word processors, spreadsheet applications and databases that can read each other's data in a pure text format, without any conversion utilities in between. We can only pray that Microsoft and all the other software vendors will agree.

9 ©Silberschatz, Korth and Sudarshan
XML: applications Data interchange is critical in today’s networked world Examples: Banking: funds transfer Order processing (especially inter-company orders) Scientific data Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), … Paper flow of information between organizations is being replaced by electronic flow of information Each application area has its own set of standards for representing information XML has become the basis for all new generation data interchange formats Database System Concepts - 5th Edition, Aug 22, 2005. 10.9 ©Silberschatz, Korth and Sudarshan

10 XML applications (Cont.)
Earlier generation formats were based on plain text with line headers indicating the meaning of fields Similar in concept to headers Does not allow for nested structures, no standard “type” language Tied too closely to low level document structure (lines, spaces, etc) Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntax DTD (Document Type Descriptors) XML Schema Plus textual descriptions of the semantics XML allows new tags to be defined as required However, this may be constrained by DTDs A wide variety of tools is available for parsing, browsing and querying XML documents/data Database System Concepts - 5th Edition, Aug 22, 2005. 10.10 ©Silberschatz, Korth and Sudarshan

11 XML Does not DO Anything
XML was not designed to DO anything. Maybe it is a little hard to understand, but XML does not DO anything. XML was created to structure, store and to send information. The following example is a note to Tove from Jani, stored as XML: <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it.

12 Comparison with Relational Data
Inefficient: tags, which in effect represent schema information, are repeated Better than relational tuples as a data-exchange format Unlike relational tuples, XML data is self-documenting due to presence of tags Non-rigid format: tags can be added Allows nested structures Wide acceptance, not only in database systems, but also in browsers, tools, and applications Database System Concepts - 5th Edition, Aug 22, 2005. 10.12 ©Silberschatz, Korth and Sudarshan

13 Syntax

14 XML syntax rules Rules are very simple and very strict. easy to learn, and very easy to use. XML documents use a self-describing and simple syntax. <?xml version="1.0" encoding="ISO "?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO (Latin-1/West European) character set. The next line describes the root element of the document (like it was saying: "this document is a note"): <note> The next 4 lines describe 4 child elements of the root (to, from, heading, and body): And finally the last line defines the end of the root element

15 XML syntax rules All XML Elements Must Have a Closing Tag
With XML, it is illegal to omit the closing tag. In HTML some elements do not have to have a closing tag. <p>This is a paragraph <p>This is another paragraph XML declaration does not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag. <p>This is a paragraph</p> XML Tags are Case Sensitive Unlike HTML, With XML, the tag <Letter> is different from the tag <letter>. XML Elements Must be Properly Nested <b><i>This text is bold and italic</i></b> Comments in XML similar to that of HTML. <!-- This is a comment -->

16 XML syntax rules All XML documents must contain a single tag pair to define a root element. All other elements must be within this root element. All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element: <root> <child> <subchild>.....</subchild> </child> </root> XML Attribute Values Must be Quoted With XML, it is illegal to omit quotation marks around attribute values.  XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct: <?xml version="1.0" encoding="ISO "?> <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> </note> With XML, the white space in your document is not truncated. This is unlike HTML.

17 XML is Free and Extensible
XML tags are not predefined. You must "invent" your own tags. The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). XML allows the author to define his own tags and his own document structure. The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document.

18 ©Silberschatz, Korth and Sudarshan
XML tags The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents. Much of the use of XML has been in data exchange applications, not as a replacement for HTML Tags make data (relatively) self-documenting E.g <bank> <account> <account_number> A </account_number> <branch_name> Downtown </branch_name> <balance> </balance> </account> <depositor> <account_number> A </account_number> <customer_name> Johnson </customer_name> </depositor> </bank> Database System Concepts - 5th Edition, Aug 22, 2005. 10.18 ©Silberschatz, Korth and Sudarshan

19 Structure of XML Data

20 ©Silberschatz, Korth and Sudarshan
Structure of XML Data Tag: label for a section of data Element: section of data beginning with <tagname> and ending with matching </tagname> Elements must be properly nested Proper nesting <account> … <balance> …. </balance> </account> Improper nesting <account> … <balance> …. </account> </balance> Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Every document must have a single top-level element Database System Concepts - 5th Edition, Aug 22, 2005. 10.20 ©Silberschatz, Korth and Sudarshan

21 Elements are extensible
XML documents can be extended to carry more information. Look at the following XML NOTE example: <note> <to>Tove</to> <from>Jani</from> <body>Don't forget me this weekend!</body> </note> Let's imagine that we created an application to produce this output: MESSAGE To: Tove From: Jani Don't forget me this weekend!

22 Elements are extensible
Imagine that the author of the XML document added some extra information to it: <note> <date> </date> <to>Tove</to> <from>Jani</from> <body>Don't forget me this weekend!</body> </note> Should the application break or crash? No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output.

23 Elements have Content An XML element is everything from (including) the element's start tag to (including) the element's end tag. An element can have: element content, mixed content, simple content, or empty content. An element can also have attributes.

24 Elements have Content <book>
book has element content, because it contains other elements. <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </book> Prod has empty content, because it carries no information Chapter has mixed content because it contains both text and other elements Para has simple content (or text content) In the example above only the prod element has attributes. The attribute named id has the value "33-657". The attribute named media has the value "paper". 

25 Element Naming XML elements must follow these naming rules:
Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML, or Xml, etc) Names cannot contain spaces follow these simple rules: Any name can be used, no words are reserved, but the idea is to make names descriptive. Avoid "-" and "." in names. (subtract. or property ) Element names can be as long as you like, XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. Use the naming rules of your database for the elements in the XML documents. Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support them. The ":" should not be used in element names because it is reserved to be used for something called namespaces

26 XML elements attributes
XML elements can have attributes in the start tag, just like HTML. Attributes are used to provide additional information about elements. From HTML <IMG SRC="computer.gif">. The SRC attribute provides additional information about the IMG element. In HTML (and in XML) attributes provide additional information about elements: Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element: <file type="gif">computer.gif</file> Attribute values must always be enclosed in quotes, but either single or double quotes can be used. <person sex="female"> <person sex='female'>

27 ©Silberschatz, Korth and Sudarshan
Attributes Elements can have attributes <account acct-type = “checking” > <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> Attributes are specified by name=value pairs inside the starting tag of an element An element may have several attributes, but each attribute name can only occur once <account acct-type = “checking” monthly-fee=“5”> Database System Concepts - 5th Edition, Aug 22, 2005. 10.27 ©Silberschatz, Korth and Sudarshan

28 Use of Elements vs. Attributes
Data can be stored in child elements or in attributes. Take a look at these examples: <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> <person> <sex>female</sex> In the first example sex is an attribute. In the last, sex is a child element. Both examples provide the same information. There are no rules about when to use attributes, and when to use child elements. In XML use child elements if the information feels like data.

29 Attributes vs. Subelements
Distinction between subelement and attribute In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of data representation, the difference is unclear and may be confusing Same information can be represented in two ways <account account_number = “A-101”> …. </account> <account> <account_number>A-101</account_number> … </account> Suggestion: use attributes for identifiers of elements, and use subelements for contents Database System Concepts - 5th Edition, Aug 22, 2005. 10.29 ©Silberschatz, Korth and Sudarshan

30 Avoid using attributes?
Some of the problems with using attributes are: attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a Document Type Definition (DTD) - which is used to define the legal elements of an XML document

31 Avoid using attributes?
If you use attributes as containers for data, you end up with documents that are difficult to read and maintain. Try to use elements to describe data. Use attributes only to provide information that is not relevant to the data. Don't end up like this (this is not how XML should be used): <note day="12" month="11" year="2002" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note> metadata (data about data) should be stored as attributes, and that data itself should be stored as elements.

32 ©Silberschatz, Korth and Sudarshan
Elements can be nested <bank-1> <customer> <customer_name> Hayes </customer_name> <customer_street> Main </customer_street> <customer_city> Harrison </customer_city> <account> <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> </balance> </account> </customer> </bank-1> Database System Concepts - 5th Edition, Aug 22, 2005. 10.32 ©Silberschatz, Korth and Sudarshan

33 Motivation for Nesting
Nesting of data is useful in data transfer Example: elements representing customer_id, customer_name, and address nested within an order element Nesting is not supported, or discouraged, in relational databases With multiple orders, customer name and address are stored redundantly normalization replaces nested structures in each order by foreign key into table storing customer name and address information Nesting is supported in object-relational databases But nesting is appropriate when transferring data External application does not have direct access to data referenced by a foreign key Database System Concepts - 5th Edition, Aug 22, 2005. 10.33 ©Silberschatz, Korth and Sudarshan

34 ©Silberschatz, Korth and Sudarshan
Elements and text Mixture of text with sub-elements is legal in XML. Example: <account> This account is seldom used any more. <account_number> A-102</account_number> <branch_name> Perryridge</branch_name> <balance>400 </balance> </account> Useful for document markup, but discouraged for data representation Database System Concepts - 5th Edition, Aug 22, 2005. 10.34 ©Silberschatz, Korth and Sudarshan

35 ©Silberschatz, Korth and Sudarshan
Namespaces XML data has to be exchanged between organizations Same tag name may have different meaning in different organizations, causing confusion on exchanged documents Specifying a unique string as an element name avoids confusion Better solution: use unique-name:element-name Avoid using long unique names all over document by using XML Namespaces <bank Xmlns:FB=‘ … <FB:branch> <FB:branchname>Downtown</FB:branchname> <FB:branchcity> Brooklyn </FB:branchcity> </FB:branch> … </bank> Database System Concepts - 5th Edition, Aug 22, 2005. 10.35 ©Silberschatz, Korth and Sudarshan

36 ©Silberschatz, Korth and Sudarshan
More on XML Syntax Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag <account number=“A-101” branch=“Perryridge” balance=“200 /> To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below <![CDATA[<account> … </account>]]> Here, <account> and </account> are treated as just strings CDATA stands for “character data” Database System Concepts - 5th Edition, Aug 22, 2005. 10.36 ©Silberschatz, Korth and Sudarshan

37 DTD

38 Well Formed XML Documents
A "Well Formed" XML document has correct XML syntax. A "Well Formed" XML document is a document that conforms to the XML syntax rules that were described in the previous chapters: XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must always be quoted <?xml version="1.0" encoding="ISO "?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

39 Valid XML Documents A "Valid" XML document also conforms to a DTD.
A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD): <?xml version="1.0" encoding="ISO "?> <!DOCTYPE note SYSTEM "InternalNote.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

40 DTD vs SCHEMA A DTD defines the legal elements of an XML document.
The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. Why Use a DTD? With a DTD, each of your XML files can carry a description of its own format. With a DTD, independent groups of people can agree to use a standard DTD for interchanging data. Your application can use a standard DTD to verify that the data you receive from the outside world is valid. You can also use a DTD to verify your own data. W3C supports an XML based alternative to DTD called XML Schema

41 Document Type Definition (DTD)
The type of an XML document can be specified using a DTD DTD constraints structure of XML data What elements can occur What attributes can/must an element have What subelements can/must occur inside each element, and how many times. DTD does not constrain data types All values represented as strings in XML DTD syntax <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) > Database System Concepts - 5th Edition, Aug 22, 2005. 10.41 ©Silberschatz, Korth and Sudarshan

42 DTD introduction A DTD can be declared inline inside an XML document, or as an external reference. If the DTD is declared inside the XML file It should be wrapped in a DOCTYPE definition with the following syntax: <!DOCTYPE root-element [element-declarations]> Example XML document with an internal DTD: <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> The DTD above is interpreted like this: !DOCTYPE note defines that the root element of this document is note. !ELEMENT note defines that the note element contains four elements: "to,from,heading,body". !ELEMENT to defines the to element  to be of the type "#PCDATA". !ELEMENT from defines the from element to be of the type "#PCDATA". !ELEMENT heading defines the heading element to be of the type "#PCDATA". !ELEMENT body defines the body element to be of the type "#PCDATA". <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note>

43 DTD introduction (cont)
External DTD Declaration If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the following syntax: <!DOCTYPE root-element SYSTEM "filename"> This is the same XML document as above, but with an external DTD <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> And this is the file "note.dtd" which contains the DTD: <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

44 DTD - XML Building Blocks
Seen from a DTD point of view, all XML documents (and HTML documents) are made up by the following building blocks: Elements Attributes Entities PCDATA CDATA

45 Elements and attributes
Elements are the main building blocks of both XML and HTML documents. Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img". Examples: <body>some text</body> <message>some text</message> Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file: <img src="computer.gif" /> The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".

46 Entities Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. Most of you know the HTML entity: " ". This "no-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML: Entity References Character < < > > & & " " &apos; '

47 Entities Entities are variables used to define shortcuts to standard text or special characters. Entity references are references to entities Entities can be declared internal or external

48 An Internal Entity Declaration
Syntax <!ENTITY entity-name "entity-value"> Example DTD Example: <!ENTITY writer "Donald Duck."> <!ENTITY copyright "Copyright W3Schools."> XML example: <author>&writer;&copyright;</author> Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).

49 An External Entity Declaration
Syntax <!ENTITY entity-name SYSTEM "URI/URL"> DTD Example: <!ENTITY writer SYSTEM " <!ENTITY copyright SYSTEM " XML example: <author>&writer;&copyright;</author>

50 PCDATA and CDATA PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters; these need to be represented by the & < and > entities, respectively. CDATA means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

51 Element Specification in DTD
Subelements can be specified as names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a subelement) Example <! ELEMENT depositor (customer_name account_number)> <! ELEMENT customer_name (#PCDATA)> <! ELEMENT account_number (#PCDATA)> Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)> Notation: “|” - alternatives “+” or more occurrences “*” or more occurrences Database System Concepts - 5th Edition, Aug 22, 2005. 10.51 ©Silberschatz, Korth and Sudarshan

52 Empty Elements and PCDATA
Empty elements are declared with the category keyword EMPTY: <!ELEMENT element-name EMPTY> Example: <!ELEMENT br EMPTY> XML example: <br /> Elements with Parsed Character Data Elements with only parsed character data are declared with #PCDATA inside parentheses: <!ELEMENT element-name (#PCDATA)> <!ELEMENT from (#PCDATA)>

53 More on elements Elements declared with the category keyword ANY, can contain any combination of parsable data: <!ELEMENT element-name ANY> Example: <!ELEMENT note ANY> Elements with one or more children (sequences) are declared with the name of the children elements inside parentheses: <!ELEMENT element-name (child1)> or <!ELEMENT element-name (child1,child2,...)> <!ELEMENT note (to,from,heading,body)> When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the "note" element is: <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

54 Elements (cont) Declaring Only One Occurrence of an Element
<!ELEMENT element-name (child-name)> <!ELEMENT note (message)> The example above declares that the child element "message" must occur once, and only once inside the "note" element. Declaring Minimum One Occurrence of an Element <!ELEMENT element-name (child-name+)> <!ELEMENT note (message+)> The + sign in the example above declares that the child element "message" must occur one or more times inside the "note" element. Declaring Zero or More Occurrences of an Element <!ELEMENT element-name (child-name*)> <!ELEMENT note (message*)> The * sign in the example above declares that the child element "message" can occur zero or more times inside the "note" element.

55 More on elements Declaring Zero or One Occurrences of an Element
<!ELEMENT element-name (child-name?)> <!ELEMENT note (message?)> The ? sign in the example above declares that the child element "message" can occur zero or one time inside the "note" element. Declaring either/or Content <!ELEMENT note (to,from,header,(message|body))> The example above declares that the "note" element must contain a "to" element, a "from" element, a "header" element, and either a "message" or a "body" element. Declaring Mixed Content <!ELEMENT note (#PCDATA|to|from|header|message)*> The example above declares that the "note" element can contain zero or more occurrences of parsed character data, "to", "from", "header", or "message" elements.

56 ©Silberschatz, Korth and Sudarshan
Bank DTD <!DOCTYPE bank [ <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (account_number branch_name balance)> <! ELEMENT customer(customer_name customer_street customer_city)> <! ELEMENT depositor (customer_name account_number)> <! ELEMENT account_number (#PCDATA)> <! ELEMENT branch_name (#PCDATA)> <! ELEMENT balance(#PCDATA)> <! ELEMENT customer_name(#PCDATA)> <! ELEMENT customer_street(#PCDATA)> <! ELEMENT customer_city(#PCDATA)> ]> Database System Concepts - 5th Edition, Aug 22, 2005. 10.56 ©Silberschatz, Korth and Sudarshan

57 Declaring Attributes An attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attribute-type default-value> DTD example: <!ATTLIST payment type CDATA "check"> XML example: <payment type="check" />

58 Attribute Specification in DTD
Attribute specification : for each attribute Name Type of attribute CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) more on this later Whether mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED) Examples <!ATTLIST account acct-type CDATA “checking”> <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED > Database System Concepts - 5th Edition, Aug 22, 2005. 10.58 ©Silberschatz, Korth and Sudarshan

59 The attribute-type can be one of the following
Type Description CDATA The value is character data (en1|en2|..) The value must be one from an enumerated list ID The value is a unique id IDREF The value is the id of another element IDREFS The value is a list of other ids NMTOKEN The value is a valid XML name NMTOKENS The value is a list of valid XML names ENTITY The value is an entity ENTITIES The value is a list of entities NOTATION The value is a name of a notation xml: The value is a predefined xml value

60 Attributes The default-value can be one of the following:
Value Explanation Value The default value of the attribute #REQUIRED The attribute is required #IMPLIED The attribute is not required #FIXED value The attribute value is fixed DTD: <!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> Valid XML: <square width="100" /> In the example above, the "square" element is defined to be an empty element with a "width" attribute of type CDATA. If no width is specified, it has a default value of 0.

61 attributes #REQUIRED Syntax
<!ATTLIST element-name attribute_name attribute-type #REQUIRED> DTD: <!ATTLIST person number CDATA #REQUIRED> Valid XML: <person number="5677" /> Invalid XML: <person /> Use the #REQUIRED keyword if you don't have an option for a default value, but still want to force the attribute to be present.

62 attributes #IMPLIED <!ATTLIST element-name attribute-name attribute-type #IMPLIED> DTD: <!ATTLIST contact fax CDATA #IMPLIED> Valid XML: <contact fax=" " /> <contact /> Use the #IMPLIED keyword if you don't want to force the author to include an attribute, and you don't have an option for a default value. #FIXED <!ATTLIST element-name attribute-name attribute-type #FIXED "value"> <!ATTLIST sender company CDATA #FIXED "Microsoft"> <sender company="Microsoft" /> Invalid XML: <sender company="W3Schools" /> Use the #FIXED keyword when you want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error.

63 attributes Enumerated Attribute Values
<!ATTLIST element-name attribute-name (en1|en2|..) default-value> Example DTD: <!ATTLIST payment type (check|cash) "cash"> XML example: <payment type="check" /> or <payment type="cash" /> Use enumerated attribute values when you want the attribute value to be one of a fixed set of legal values.

64 ©Silberschatz, Korth and Sudarshan
IDs and IDREFs An element can have at most one attribute of type ID The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier An attribute of type IDREF must contain the ID value of an element in the same document An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document Database System Concepts - 5th Edition, Aug 22, 2005. 10.64 ©Silberschatz, Korth and Sudarshan

65 Bank DTD with Attributes
Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[ <!ELEMENT account (branch, balance)> <!ATTLIST account account_number ID # REQUIRED owners IDREFS # REQUIRED> <!ELEMENT customer(customer_name, customer_street, customer_city)> <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED> … declarations for branch, balance, customer_name, customer_street and customer_city ]> Database System Concepts - 5th Edition, Aug 22, 2005. 10.65 ©Silberschatz, Korth and Sudarshan

66 XML data with ID and IDREF attributes
<bank-2> <account account_number=“A-401” owners=“C100 C102”> <branch_name> Downtown </branch_name> <balance> </balance> </account> <customer customer_id=“C100” accounts=“A-401”> <customer_name>Joe </customer_name> <customer_street> Monroe </customer_street> <customer_city> Madison</customer_city> </customer> <customer customer_id=“C102” accounts=“A-401 A-402”> <customer_name> Mary </customer_name> <customer_street> Erin </customer_street> <customer_city> Newark </customer_city> </bank-2> Database System Concepts - 5th Edition, Aug 22, 2005. 10.66 ©Silberschatz, Korth and Sudarshan

67 Example: TV Schedule DTD
By David Moisan. Copied from <!DOCTYPE TVSCHEDULE [ <!ELEMENT TVSCHEDULE (CHANNEL+)> <!ELEMENT CHANNEL (BANNER,DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)> <!ELEMENT HOLIDAY (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)> <!ELEMENT TIME (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)> <!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED> <!ATTLIST CHANNEL CHAN CDATA #REQUIRED> <!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED> <!ATTLIST TITLE RATING CDATA #IMPLIED> <!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> ]>

68 Newspaper Article DTD <!DOCTYPE NEWSPAPER [
Copied from <!DOCTYPE NEWSPAPER [ <!ELEMENT NEWSPAPER (ARTICLE+)> <!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)> <!ELEMENT HEADLINE (#PCDATA)> <!ELEMENT BYLINE (#PCDATA)> <!ELEMENT LEAD (#PCDATA)> <!ELEMENT BODY (#PCDATA)> <!ELEMENT NOTES (#PCDATA)> <!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED> <!ATTLIST ARTICLE EDITOR CDATA #IMPLIED> <!ATTLIST ARTICLE DATE CDATA #IMPLIED> <!ATTLIST ARTICLE EDITION CDATA #IMPLIED> <!ENTITY NEWSPAPER "Vervet Logic Times"> <!ENTITY PUBLISHER "Vervet Logic Press"> <!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press"> ]>

69 ©Silberschatz, Korth and Sudarshan
Limitations of DTDs No typing of text elements and attributes All values are strings, no integers, reals, etc. Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved) (A | B)* allows specification of an unordered set, but Cannot ensure that each of A and B occurs only once IDs and IDREFs are untyped The owners attribute of an account may contain a reference to another account, which is meaningless owners attribute should ideally be constrained to refer to customer elements Database System Concepts - 5th Edition, Aug 22, 2005. 10.69 ©Silberschatz, Korth and Sudarshan

70 ©Silberschatz, Korth and Sudarshan
XML Schema XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values E.g. integer, string, etc Also, constraints on min/max values User-defined, complex types Many more features, including uniqueness and foreign key constraints, inheritance XML Schema is itself specified in XML syntax, unlike DTDs More-standard representation, but verbose XML Scheme is integrated with namespaces BUT: XML Schema is significantly more complicated than DTDs. Database System Concepts - 5th Edition, Aug 22, 2005. 10.70 ©Silberschatz, Korth and Sudarshan

71 Schemas XML Schema is an XML-based alternative to DTD.
An XML schema describes the structure of an XML document. The XML Schema language is also referred to as XML Schema Definition (XSD). The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. An XML Schema: defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes

72 Why use schemas XML Schemas are much more powerful than DTDs.
XML Schemas Support Data Types One of the greatest strength of XML Schemas is the support for data types. With support for data types: It is easier to describe allowable document content It is easier to validate the correctness of data It is easier to work with data from a database It is easier to define data facets (restrictions on data) It is easier to define data patterns (data formats) It is easier to convert data between different data types

73 An XML Schema <?xml version="1.0"?>
The following XML Schema file called "note.xsd" that defines the elements of the XML document above ("note.xml"): <?xml version="1.0"?> <xs:schema xmlns:xs=" targetNamespace=" xmlns=" elementFormDefault="qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

74 Schema feartures xmlns:xs="http://www.w3.org/2001/XMLSchema"
indicates that the elements and data types used in the schema come from the " namespace. It also specifies that the elements and data types that come from the " namespace should be prefixed with xs:: targetNamespace=" indicates that the elements defined by this schema (note, to, from, heading, body.) come from the " namespace. xmlns=" indicates that the default namespace is " elementFormDefault="qualified" indicates that any elements used by the XML instance document which were declared in this schema must be namespace qualified.

75 A Reference to an XML Schema
This XML document has a reference to an XML Schema: <?xml version="1.0"?> <note xmlns=" xmlns:xsi=" xsi:schemaLocation=" note.xsd"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

76 Explanation xmlns="http://www.w3schools.com"
specifies the default namespace declaration. This declaration tells the schema-validator that all the elements used in this XML document are declared in the " namespace. Once you have the XML Schema Instance namespace available: xmlns:xsi=" you can use the schemaLocation attribute. This attribute has two values. The first value is the namespace to use. The second value is the location of the XML schema to use for that namespace: xsi:schemaLocation=" note.xsd"

77 Simple elements XML Schemas define the elements of your XML files.
A simple element is an XML element that contains only text. It cannot contain any other elements or attributes. A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. However, the "only text" restriction is quite misleading. The text can be of many different types. It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself. You can also add restrictions (facets) to a data type in order to limit its content, or you can require the data to match a specific pattern.

78 Defining a Simple Element
The syntax for defining a simple element is: <xs:element name="xxx" type="yyy"/> where xxx is the name of the element and yyy is the data type of the element. XML Schema has a lot of built-in data types. The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

79 Example Here are some XML elements:
<lastname>Refsnes</lastname> <age>36</age> <dateborn> </dateborn> And here are the corresponding simple element definitions: <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/>

80 Default and Fixed Values for Simple Elements
Simple elements may have a default value OR a fixed value specified. A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red": <xs:element name="color" type="xs:string" default="red"/> A fixed value is also automatically assigned to the element, and you cannot specify another value. In the following example the fixed value is "red": <xs:element name="color" type="xs:string" fixed="red"/>

81 Attributes Simple elements cannot have attributes. If an element has attributes, it is considered to be of a complex type. But the attribute itself is always declared as a simple type. The syntax for defining an attribute is: <xs:attribute name="xxx" type="yyy"/> where xxx is the name of the attribute and yyy specifies the data type of the attribute. The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

82 Attributes (cont.) Here is an XML element with an attribute:
<lastname lang="EN">Smith</lastname> And here is the corresponding attribute definition: <xs:attribute name="lang" type="xs:string"/> Default and fixed like with elements

83 Restrictions (facets)
Restrictions are used to define acceptable values for XML elements or attributes. Restrictions on XML elements are called facets. Restrictions on Values The following example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 120: <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction> </xs:simpleType> </xs:element>

84 Restrictions on a Set of Values
To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint. The example below defines an element called "car" with a restriction. The only acceptable values are: Audi, Golf, BMW: <xs:element name="car"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element> Many Restrictions can be settled in data types (lenght, values, set of values)

85 What is a Complex Element?
A complex element is an XML element that contains other elements and/or attributes. There are four kinds of complex elements: empty elements elements that contain only other elements elements that contain only text elements that contain both other elements and text Note: Each of these elements may contain attributes as well!

86 How to Define a Complex Element
Look at this complex XML element, "employee", which contains only other elements: <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee> We can define a complex element in an XML Schema two different ways: The "employee" element can be declared directly by naming the element, like this: <xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>

87 Complex element definition
The "employee" element can have a type attribute that refers to the name of the complex type to use: <xs:element name="employee" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>

88 Complex elements If you use the method described above, several elements can refer to the same complex type, like this: <xs:element name="employee" type="personinfo"/> <xs:element name="student" type="personinfo"/> <xs:element name="member" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>

89 XML Schema Version of Bank DTD
<xs:schema xmlns:xs= <xs:element name=“bank” type=“BankType”/> <xs:element name=“account”> <xs:complexType> <xs:sequence> <xs:element name=“account_number” type=“xs:string”/> <xs:element name=“branch_name” type=“xs:string”/> <xs:element name=“balance” type=“xs:decimal”/> </xs:squence> </xs:complexType> </xs:element> ….. definitions of customer and depositor …. <xs:complexType name=“BankType”> <xs:squence> <xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xs:sequence> </xs:complexType> </xs:schema> Database System Concepts - 5th Edition, Aug 22, 2005. 10.89 ©Silberschatz, Korth and Sudarshan

90 XML Schema Version of Bank DTD
Choice of “xs:” was ours -- any other namespace prefix could be chosen Element “bank” has type “BankType”, which is defined separately xs:complexType is used later to create the named complex type “BankType” Element “account” has its type defined in-line Database System Concepts - 5th Edition, Aug 22, 2005. 10.90 ©Silberschatz, Korth and Sudarshan

91 More features of XML Schema
Attributes specified by xs:attribute tag: <xs:attribute name = “account_number”/> adding the attribute use = “required” means value must be specified Key constraint: “account numbers form a key for account elements under the root bank element: <xs:key name = “accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:key> Foreign key constraint from depositor to account: <xs:keyref name = “depositorAccountKey” refer=“accountKey”> <\xs:keyref> Database System Concepts - 5th Edition, Aug 22, 2005. 10.91 ©Silberschatz, Korth and Sudarshan

92 An XSD example <?xml version="1.0" encoding="ISO-8859-1"?>
<shiporder orderid="889923" xmlns:xsi=" xsi:noNamespaceSchemaLocation="shiporder.xsd"> <orderperson>John Smith</orderperson> <shipto> <name>Ola Nordmann</name> <address>Langgt 23</address> <city>4000 Stavanger</city> <country>Norway</country> </shipto> <item> <title>Empire Burlesque</title> <note>Special Edition</note> <quantity>1</quantity> <price>10.90</price> </item> <title>Hide your heart</title> <price>9.90</price> </shiporder>

93 Create an XML Schema of the example
Now we want to create a schema for the XML document in the slide We start by opening a new file that we will call "shiporder.xsd". To create the schema we could simply follow the structure in the XML document and define each element as we find it. We will start with the standard XML declaration followed by the xs:schema element that defines a schema: <?xml version="1.0" encoding="ISO " ?> <xs:schema xmlns:xs=" ... </xs:schema> In the schema above we use the standard namespace (xs), and the URI associated with this namespace is the Schema language definition, which has the standard value of

94 Example cont. Next, we have to define the "shiporder" element. This element has an attribute and it contains other elements, therefore we consider it as a complex type. The child elements of the "shiporder" element is surrounded by a xs:sequence element that defines an ordered sequence of sub elements: <xs:element name="shiporder"> <xs:complexType> <xs:sequence> ... </xs:sequence> </xs:complexType> </xs:element>

95 Example (cont.) Then we have to define the "orderperson" element as a simple type (because it does not contain any attributes or other elements). The type (xs:string) is prefixed with the namespace prefix associated with XML Schema that indicates a predefined schema data type: <xs:element name="orderperson" type="xs:string"/> Next, we have to define two elements that are of the complex type: "shipto" and "item". We start by defining the "shipto" element: <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>

96 example Now we can define the "item" element. This element can appear multiple times inside a "shiporder" element. This is specified by setting the maxOccurs attribute of the "item" element to "unbounded" which means that there can be as many occurrences of the "item" element as the author wishes. Notice that the "note" element is optional. We have specified this by setting the minOccurs attribute to zero: <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element>

97 example We can now declare the attribute of the "shiporder" element. Since this is a required attribute we specify use="required". Note: The attribute declarations must always come last: <xs:attribute name="orderid" type="xs:string" use="required"/>

98 example the complete listing of the schema file called "shiporder.xsd": <?xml version="1.0" encoding="ISO " ?> <xs:schema xmlns:xs=" <xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>

99 Example (cont) <xs:element name="item" maxOccurs="unbounded">
<xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> <xs:attribute name="orderid" type="xs:string" use="required"/> </xs:schema>

100 Divide the Schema The previous design method is very simple, but can be difficult to read and maintain when documents are complex. The next design method is based on defining all elements and attributes first, and then referring to them using the ref attribute. <?xml version="1.0" encoding="ISO " ?> <xs:schema xmlns:xs=" <!-- definition of simple elements --> <xs:element name="orderperson" type="xs:string"/> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> <!-- definition of attributes --> <xs:attribute name="orderid" type="xs:string"/>

101 Divide the Schema <!-- definition of complex elements -->
<xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="address"/> <xs:element ref="city"/> <xs:element ref="country"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item"> <xs:element ref="title"/> <xs:element ref="note" minOccurs="0"/> <xs:element ref="quantity"/> <xs:element ref="price"/>

102 Querying and transforming

103 Querying and Transforming XML Data
Translation of information from one XML schema to another Querying on XML data Above two are closely related, and handled by the same tools Standard XML querying/translation languages XPath Simple language consisting of path expressions XSLT Simple language designed for translation from XML to XML and XML to HTML XQuery An XML query language with a rich set of features Database System Concepts - 5th Edition, Aug 22, 2005. 10.103 ©Silberschatz, Korth and Sudarshan

104 ©Silberschatz, Korth and Sudarshan
Tree Model of XML Data Query and transformation languages are based on a tree model of XML data An XML document is modeled as a tree, with nodes corresponding to elements and attributes Element nodes have child nodes, which can be attributes or subelements Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML document Element and attribute nodes (except for the root node) have a single parent, which is an element node The root node has a single child, which is the root element of the document Database System Concepts - 5th Edition, Aug 22, 2005. 10.104 ©Silberschatz, Korth and Sudarshan

105 ©Silberschatz, Korth and Sudarshan
XPath XPath is used to address (select) parts of documents using path expressions A path expression is a sequence of steps separated by “/” Think of file names in a directory hierarchy Result of path expression: set of values that along with their containing elements/attributes match the specified path E.g /bank-2/customer/customer_name evaluated on the bank-2 data we saw earlier returns <customer_name>Joe</customer_name> <customer_name>Mary</customer_name> E.g /bank-2/customer/customer_name/text( ) returns the same names, but without the enclosing tags Database System Concepts - 5th Edition, Aug 22, 2005. 10.105 ©Silberschatz, Korth and Sudarshan

106 What is XPath? XPath is a syntax for defining parts of an XML document
XPath uses path expressions to navigate in XML documents XPath contains a library of standard functions XPath is a major element in XSLT XPath is a W3C Standard XPath Path Expressions XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system. XPath Standard Functions XPath includes over 100 built-in functions. There are functions for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, and more.

107 XPath Terminology Nodes: In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document (root) nodes. XML documents are treated as trees of nodes. The root of the tree is called the document node (or root node). Look at the following XML document: <?xml version="1.0" encoding="ISO "?> <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>

108 Example of nodes in the XML document above
<bookstore> (document node) <author>J K. Rowling</author> (element node) lang="en" (attribute node) Atomic values Atomic values are nodes with no children or parent. Example of atomic values: J K. Rowling "en"

109 Relationship of Nodes Parent
Each element and attribute has one parent. In the following example; the book element is the parent of the title, author, year, and price: <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book>

110 Relationship of Nodes Children
Element nodes may have zero, one or more children. In the following example; the title, author, year, and price elements are all children of the book element: <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book>

111 Relationship of Nodes Siblings Ancestors
Nodes that have the same parent. In the following example; the title, author, year, and price elements are all siblings: <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> Ancestors A node's parent, parent's parent, etc. In the following example; the ancestors of the title element are the book element and the bookstore element: <bookstore> </bookstore>

112 Relationship of Nodes Descendants
A node's children, children's children, etc. In the following example; descendants of the bookstore element are the book, title, author, year, and price elements: <bookstore> <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>

113 Xpath syntax The XML Example Document
We will use the following XML document in the examples below. <?xml version="1.0" encoding="ISO "?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <title lang="eng">Learning XML</title> <price>39.95</price> </bookstore>

114 Selecting Nodes XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below: Expression Description nodename Selects all child nodes of the node / Selects from the root node // Selects nodes in the document from the current node that match the selection no matter where they are . Selects the current node .. Selects the parent of the current node @ Selects attributes

115 Selecting nodes examples
Path Expression Result bookstore Selects all the child nodes of the bookstore element /bookstore Selects the root element bookstore bookstore/book Selects all book elements that are children of bookstore //book Selects all book elements no matter where they are in the document bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element Selects all attributes that are named lang

116 ©Silberschatz, Korth and Sudarshan
XPath (Cont.) The initial “/” denotes root of the document (above the top-level tag) Path expressions are evaluated left to right Each step operates on the set of instances produced by the previous step Selection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400] returns account elements with a balance value greater than 400 /bank-2/account[balance] returns account elements containing a balance subelement Attributes are accessed using E.g. /bank-2/account[balance > returns the account numbers of accounts with balance > 400 IDREF attributes are not dereferenced automatically (more on this later) Database System Concepts - 5th Edition, Aug 22, 2005. 10.116 ©Silberschatz, Korth and Sudarshan

117 ©Silberschatz, Korth and Sudarshan
Functions in XPath XPath provides several functions The function count() at the end of a path counts the number of elements in the set generated by the path E.g. /bank-2/account[count(./customer) > 2] Returns accounts with > 2 customers Also function for testing position (1, 2, ..) of node w.r.t. siblings Boolean connectives and and or and function not() can be used in predicates IDREFs can be referenced using function id() id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks E.g. returns all customers referred to from the owners attribute of account elements. Database System Concepts - 5th Edition, Aug 22, 2005. 10.117 ©Silberschatz, Korth and Sudarshan

118 ©Silberschatz, Korth and Sudarshan
More XPath Features Operator “|” used to implement union E.g. | Gives customers with either accounts or loans However, “|” cannot be nested inside other operators. “//” can be used to skip multiple levels of nodes E.g. /bank-2//customer_name finds any customer_name element anywhere under the /bank-2 element, regardless of the element in which it is contained. A step in the path can go to parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children “//”, described above, is a short from for specifying “all descendants” “..” specifies the parent. doc(name) returns the root of a named document Database System Concepts - 5th Edition, Aug 22, 2005. 10.118 ©Silberschatz, Korth and Sudarshan

119 What is XQuery? XQuery is the language for querying XML data
XQuery for XML is like SQL for databases XQuery is built on XPath expressions XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.) XQuery is a W3C Recommendation XQuery can be used to: Extract information to use in a Web Service Generate summary reports Transform XML data to XHTML Search Web documents for relevant information

120 ©Silberschatz, Korth and Sudarshan
XQuery XQuery is a general purpose query language for XML data Currently being standardized by the World Wide Web Consortium (W3C) The textbook description is based on a January 2005 draft of the standard. The final version may differ, but major features likely to stay unchanged. XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL XQuery uses a for … let … where … order by …result … (FLOWR) syntax for  SQL from where  SQL where order by  SQL order by result  SQL select let allows temporary variables, and has no equivalent in SQL Database System Concepts - 5th Edition, Aug 22, 2005. 10.120 ©Silberschatz, Korth and Sudarshan

121 ©Silberschatz, Korth and Sudarshan
FLWOR Syntax in XQuery For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath Simple FLWOR expression in XQuery find all accounts with balance > 400, with each result enclosed in an <account_number> .. </account_number> tag for $x in /bank-2/account let $acctno := where $x/balance > return <account_number> { $acctno } </account_number> Items in the return clause are XML text unless enclosed in {}, in which case they are evaluated Database System Concepts - 5th Edition, Aug 22, 2005. 10.121 ©Silberschatz, Korth and Sudarshan

122 Flowr expression Let clause not really needed in this query, and selection can be done In XPath. Query can be written as: for $x in /bank-2/account[balance>400] return <account_number> { } </account_number>

123 ©Silberschatz, Korth and Sudarshan
Joins Joins are specified in a manner very similar to SQL for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor where $a/account_number = $d/account_number and $c/customer_name = $d/customer_name return <cust_acct> { $c $a } </cust_acct> The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account_number = $a/account_number and customer_name = $c/customer_name] Database System Concepts - 5th Edition, Aug 22, 2005. 10.123 ©Silberschatz, Korth and Sudarshan

124 ©Silberschatz, Korth and Sudarshan
Nested Queries The following query converts data from the flat structure for bank information into the nested structure used in bank-1 <bank-1> { for $c in /bank/customer return <customer> { $c/* } { for $d in /bank/depositor[customer_name = $c/customer_name], $a in /bank/account[account_number=$d/account_number] return $a } </customer> } </bank-1> $c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag $c/text() gives text content of an element without any subelements / tags Database System Concepts - 5th Edition, Aug 22, 2005. 10.124 ©Silberschatz, Korth and Sudarshan

125 ©Silberschatz, Korth and Sudarshan
Sorting in XQuery The order by clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer order by $c/customer_name return <customer> { $c/* } </customer> Use order by $c/customer_name to sort in descending order Can sort at multiple levels of nesting (sort by customer_name, and by account_number within each customer) <bank-1> { for $c in /bank/customer order by $c/customer_name return <customer> { $c/* } { for $d in /bank/depositor[customer_name=$c/customer_name], $a in /bank/account[account_number=$d/account_number] } order by $a/account_number return <account> $a/* </account> </customer> } </bank-1> Database System Concepts - 5th Edition, Aug 22, 2005. 10.125 ©Silberschatz, Korth and Sudarshan

126 Functions and Other XQuery Features
User defined functions with the type system of XMLSchema function balances(xs:string $c) returns list(xs:decimal*) { for $d in /bank/depositor[customer_name = $c], $a in /bank/account[account_number = $d/account_number] return $a/balance } Types are optional for function parameters and return values The * (as in decimal*) indicates a sequence of values of that type Universal and existential quantification in where clause predicates some $e in path satisfies P every $e in path satisfies P XQuery also supports If-then-else clauses Database System Concepts - 5th Edition, Aug 22, 2005. 10.126 ©Silberschatz, Korth and Sudarshan

127 XQuery Conditional Expressions
"If-Then-Else" expressions are allowed in XQuery. Look at the following example: for $x in doc("books.xml")/bookstore/book return if then <child>{data($x/title)}</child> else <adult>{data($x/title)}</adult>

128 XQuery Conditional Expressions
The result of the example above will be: <adult>Everyday Italian</adult> <child>Harry Potter</child> <adult>Learning XML</adult> <adult>XQuery Kick Start</adult>

129 Present the Result In an HTML List
Look at the following XQuery FLWOR expression: for $x in doc("books.xml")/bookstore/book/title order by $x return $x The expression above will select all the title elements under the book elements that are under the bookstore element, and return the title elements in alphabetical order.

130 Present the Result In an HTML List
Now we want to list all the book-titles in our bookstore in an HTML list. We add <ul> and <li> tags to the FLWOR expression: <ul> { for $x in doc("books.xml")/bookstore/book/title order by $x return <li>{$x}</li> } </ul> The result of the above will be: <li><title lang="en">Everyday Italian</title></li> <li><title lang="en">Harry Potter</title></li> <li><title lang="en">Learning XML</title></li> <li><title lang="en">XQuery Kick Start</title></li>

131 Presenting results as HTML
Now we want to eliminate the title element, and show only the data inside the title element: <ul> { for $x in doc("books.xml")/bookstore/book/title order by $x return <li>{data($x)}</li> } </ul> The result will be (an HTML list): <li>Everyday Italian</li> <li>Harry Potter</li> <li>Learning XML</li> <li>XQuery Kick Start</li>

132 ©Silberschatz, Korth and Sudarshan
XSLT A stylesheet stores formatting options for a document, usually separately from document E.g. an HTML style sheet may specify font colors and sizes for headings, etc. The XML Stylesheet Language (XSL) was originally designed for generating HTML from XML XSLT is a general-purpose transformation language Can translate XML to XML, and XML to HTML XSLT transformations are expressed using rules called templates Templates combine selection using XPath with construction of results Database System Concepts - 5th Edition, Aug 22, 2005. 10.132 ©Silberschatz, Korth and Sudarshan

133 ©Silberschatz, Korth and Sudarshan
XSLT Templates Example of XSLT template with match and select part <xsl:template match=“/bank-2/customer”> <xsl:value-of select=“customer_name”/> </xsl:template> <xsl:template match=“*”/> The match attribute of xsl:template specifies a pattern in XPath Elements in the XML document matching the pattern are processed by the actions within the xsl:template element xsl:value-of selects (outputs) specified values (here, customer_name) For elements that do not match any template Attributes and text contents are output as is Templates are recursively applied on subelements The <xsl:template match=“*”/> template matches all elements that do not match any other template Used to ensure that their contents do not get output. If an element matches several templates, only one is used based on a complex priority scheme/user-defined priorities Database System Concepts - 5th Edition, Aug 22, 2005. 10.133 ©Silberschatz, Korth and Sudarshan

134 ©Silberschatz, Korth and Sudarshan
Creating XML Output Any text or tag in the XSL stylesheet that is not in the xsl namespace is output as is E.g. to wrap results in new XML elements. <xsl:template match=“/bank-2/customer”> <customer> <xsl:value-of select=“customer_name”/> </customer> </xsl;template> <xsl:template match=“*”/> Example output: <customer> Joe </customer> <customer> Mary </customer> Database System Concepts - 5th Edition, Aug 22, 2005. 10.134 ©Silberschatz, Korth and Sudarshan

135 Creating XML Output (Cont.)
Note: Cannot directly insert a xsl:value-of tag inside another tag E.g. cannot create an attribute for <customer> in the previous example by directly using xsl:value-of XSLT provides a construct xsl:attribute to handle this situation xsl:attribute adds attribute to the preceding element E.g. <customer> <xsl:attribute name=“customer_id”> <xsl:value-of select = “customer_id”/> </xsl:attribute> </customer> results in output of the form <customer customer_id=“….”> …. xsl:element is used to create output elements with computed names Database System Concepts - 5th Edition, Aug 22, 2005. 10.135 ©Silberschatz, Korth and Sudarshan

136 ©Silberschatz, Korth and Sudarshan
Structural Recursion Template action can apply templates recursively to the contents of a matched element <xsl:template match=“/bank”> <customers> <xsl:template apply-templates/> </customers > </xsl:template> <xsl:template match=“/customer”> <customer> <xsl:value-of select=“customer_name”/> </customer> <xsl:template match=“*”/> Example output: <customers> <customer> John </customer> <customer> Mary </customer> </customers> Database System Concepts - 5th Edition, Aug 22, 2005. 10.136 ©Silberschatz, Korth and Sudarshan

137 ©Silberschatz, Korth and Sudarshan
Joins in XSLT XSLT keys allow elements to be looked up (indexed) by values of subelements or attributes Keys must be declared (with a name) and, the key() function can then be used for lookup. E.g. <xsl:key name=“acctno” match=“account” use=“account_number”/> <xsl:value-of select=key(“acctno”, “A-101”) Keys permit (some) joins to be expressed in XSLT <xsl:key name=“acctno” match=“account” use=“account_number”/> <xsl:key name=“custno” match=“customer” use=“customer_name”/> <xsl:template match=“depositor”> <cust_acct> <xsl:value-of select=key(“custno”, “customer_name”)/> <xsl:value-of select=key(“acctno”, “account_number”)/> </cust_acct> </xsl:template> <xsl:template match=“*”/> Database System Concepts - 5th Edition, Aug 22, 2005. 10.137 ©Silberschatz, Korth and Sudarshan

138 ©Silberschatz, Korth and Sudarshan
Sorting in XSLT Using an xsl:sort directive inside a template causes all elements matching the template to be sorted Sorting is done before applying other templates <xsl:template match=“/bank”> <xsl:apply-templates select=“customer”> <xsl:sort select=“customer_name”/> </xsl:apply-templates> </xsl:template> <xsl:template match=“customer”> <customer> <xsl:value-of select=“customer_name”/> <xsl:value-of select=“customer_street”/> <xsl:value-of select=“customer_city”/> </customer> <xsl:template> <xsl:template match=“*”/> Database System Concepts - 5th Edition, Aug 22, 2005. 10.138 ©Silberschatz, Korth and Sudarshan

139 Accesing and storing XML

140 Application Program Interface
There are two standard application program interfaces to XML data: SAX (Simple API for XML) Based on parser model, user provides event handlers for parsing events E.g. start of element, end of element Not suitable for database applications DOM (Document Object Model) XML data is parsed into a tree representation Variety of functions provided for traversing the DOM tree E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … Also provides functions for updating DOM tree Database System Concepts - 5th Edition, Aug 22, 2005. 10.140 ©Silberschatz, Korth and Sudarshan

141 ©Silberschatz, Korth and Sudarshan
Storage of XML Data XML data can be stored in Non-relational data stores Flat files Natural for storing XML But has all problems of files (no concurrency, no recovery, …) XML database Database built specifically for storing XML data, supporting DOM model and declarative querying Currently no commercial-grade systems Relational databases Data must be translated into relational form Advantage: mature database systems Disadvantages: overhead of translating data and queries Database System Concepts - 5th Edition, Aug 22, 2005. 10.141 ©Silberschatz, Korth and Sudarshan

142 Storage of XML in Relational Databases
Alternatives: String Representation Tree Representation Map to relations Database System Concepts - 5th Edition, Aug 22, 2005. 10.142 ©Silberschatz, Korth and Sudarshan

143 String Representation
Store each top level element as a string field of a tuple in a relational database Use a single relation to store all elements, or Use a separate relation for each top-level element type E.g. account, customer, depositor relations Each with a string-valued attribute to store the element Indexing: Store values of subelements/attributes to be indexed as extra fields of the relation, and build indices on these fields E.g. customer_name or account_number Some database systems support function indices, which use the result of a function as the key value. The function should return the value of the required subelement/attribute Database System Concepts - 5th Edition, Aug 22, 2005. 10.143 ©Silberschatz, Korth and Sudarshan

144 String Representation (Cont.)
Benefits: Can store any XML data even without DTD As long as there are many top-level elements in a document, strings are small compared to full document Allows fast access to individual elements. Drawback: Need to parse strings to access values inside the elements Parsing is slow. Database System Concepts - 5th Edition, Aug 22, 2005. 10.144 ©Silberschatz, Korth and Sudarshan

145 ©Silberschatz, Korth and Sudarshan
Tree Representation Tree representation: model XML data as tree and store using relations nodes(id, type, label, value) child (child_id, parent_id) Each element/attribute is given a unique identifier Type indicates element/attribute Label specifies the tag name of the element/name of attribute Value is the text value of the element/attribute The relation child notes the parent-child relationships in the tree Can add an extra attribute to child to record ordering of children bank (id:1) customer (id:2) account (id: 5) customer_name (id: 3) account_number (id: 7) Database System Concepts - 5th Edition, Aug 22, 2005. 10.145 ©Silberschatz, Korth and Sudarshan

146 Tree Representation (Cont.)
Benefit: Can store any XML data, even without DTD Drawbacks: Data is broken up into too many pieces, increasing space overheads Even simple queries require a large number of joins, which can be slow Database System Concepts - 5th Edition, Aug 22, 2005. 10.146 ©Silberschatz, Korth and Sudarshan

147 Mapping XML Data to Relations
Relation created for each element type whose schema is known: An id attribute to store a unique id for each element A relation attribute corresponding to each element attribute A parent_id attribute to keep track of parent element As in the tree representation Position information (ith child) can be store too All subelements that occur only once can become relation attributes For text-valued subelements, store the text as attribute value For complex subelements, can store the id of the subelement Subelements that can occur multiple times represented in a separate table Similar to handling of multivalued attributes when converting ER diagrams to tables Database System Concepts - 5th Edition, Aug 22, 2005. 10.147 ©Silberschatz, Korth and Sudarshan

148 Storing XML Data in Relational Systems
Publishing: process of converting relational data to an XML format Shredding: process of converting an XML document into a set of tuples to be inserted into one or more relations XML-enabled database systems support automated publishing and shredding Some systems offer native storage of XML data using the xml data type. Special internal data structures and indices are used for efficiency Database System Concepts - 5th Edition, Aug 22, 2005. 10.148 ©Silberschatz, Korth and Sudarshan

149 ©Silberschatz, Korth and Sudarshan
SQL/XML New standard SQL extension that allows creation of nested XML output Each output tuple is mapped to an XML element row <bank> <account> <row> <account_number> A-101 </account_number> <branch_name> Downtown </branch_name> <balance> 500 </balance> </row> …. more rows if there are more output tuples … </account> </bank> Database System Concepts - 5th Edition, Aug 22, 2005. 10.149 ©Silberschatz, Korth and Sudarshan

150 ©Silberschatz, Korth and Sudarshan
SQL Extensions xmlelement creates XML elements xmlattributes creates attributes select xmlelement (name “account, xmlattributes (account_number as account_number), xmlelement (name “branch_name”, branch_name), xmlelement (name “balance”, balance)) from account Database System Concepts - 5th Edition, Aug 22, 2005. 10.150 ©Silberschatz, Korth and Sudarshan

151 ©Silberschatz, Korth and Sudarshan
Web Services The Simple Object Access Protocol (SOAP) standard: Invocation of procedures across applications with distinct databases XML used to represent procedure input and output A Web service is a site providing a collection of SOAP procedures Described using the Web Services Description Language (WSDL) Directories of Web services are described using the Universal Description, Discovery, and Integration (UDDI) standard Database System Concepts - 5th Edition, Aug 22, 2005. 10.151 ©Silberschatz, Korth and Sudarshan

152 Thanks Facultad de Informática. Universidad Politécnica de Madrid
Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid


Download ppt "XML Facultad de Informática. Universidad Politécnica de Madrid"

Similar presentations


Ads by Google