Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 480: Database Systems Lecture 26 March 18, 2013.

Similar presentations


Presentation on theme: "CS 480: Database Systems Lecture 26 March 18, 2013."— Presentation transcript:

1 CS 480: Database Systems Lecture 26 March 18, 2013

2 XML: Motivation n Data interchange is critical in today’s networked world l Examples:  Banking: funds transfer  Order processing (especially inter-company orders)  Scientific data – Chemistry: ChemML, … – Genetics: BSML (Bio-Sequence Markup Language), … l Paper flow of information between organizations is being replaced by electronic flow of information n Each application area has its own set of standards for representing information n XML has become the basis for all new generation data interchange formats

3 Comparison with Relational Data n Inefficient: tags, which in effect represent schema information, are repeated n Better than relational tuples as a data-exchange format l Unlike relational tuples, XML data is self-documenting due to presence of tags l Non-rigid format: tags can be added l Allows nested structures l Wide acceptance, not only in database systems, but also in browsers, tools, and applications

4 Structure of XML Data n Tag: label for a section of data n Element: section of data beginning with and ending with matching n Elements must be properly nested l Proper nesting  … …. l Improper nesting  … …. l Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. n Every document must have a single top-level element

5 Example of Nested Elements P-101 …. RS1 Atom powered rocket sled 2 199.95 SG2 Superb glue 1 liter 29.95

6 Motivation for Nesting n Nesting of data is useful in data transfer l Example: elements representing item nested within an itemlist element (no need to do joins to get all the items). n Nesting is not supported, or discouraged, in relational databases l With multiple orders, customer name and address are stored redundantly l normalization replaces nested structures in each order by foreign key into table storing customer name and address information l Nesting is supported in object-relational databases n But nesting is appropriate when transferring data l External application does not have direct access to data referenced by a foreign key

7 Structure of XML Data n Mixture of text with sub-elements is legal in XML. l Example: This course is being offered for the first time in 2009. BIO-399 Computational Biology Biology 3 l Useful for document markup, but discouraged for data representation

8 Attributes n Elements can have attributes Intro. to Computer Science Comp. Sci. 4 n Attributes are specified by name=value pairs inside the starting tag of an element n An element may have several attributes, but each attribute name can only occur once

9 Attributes vs. Subelements n Distinction between subelement and attribute l In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents l In the context of data representation, the difference is unclear and may be confusing  Same information can be represented in two ways – … – CS-101 … l Suggestion: use attributes for identifiers of elements, and use subelements for contents

10 Namespaces n XML data has to be exchanged between organizations n Same tag name may have different meaning in different organizations, causing confusion on exchanged documents n Specifying a unique string as an element name avoids confusion n Better solution: use unique-name:element-name n Avoid using long unique names all over document by using XML Namespaces …http://www.yale.edu CS-101 Intro. to Computer Science Comp. Sci. 4 …

11 More on XML Syntax n Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag l n To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below l … ]]> Here, and are treated as just strings CDATA stands for “character data”

12 XML Document Schema n Database schemas constrain what information can be stored, and the data types of stored values n XML documents are not required to have an associated schema n However, schemas are very important for XML data exchange l Otherwise, a site cannot automatically interpret data received from another site n Two mechanisms for specifying XML schema l Document Type Definition (DTD)  Widely used l XML Schema  Newer, increasing use

13 Document Type Definition (DTD) n The type of an XML document can be specified using a DTD n DTD constraints structure of XML data l What elements can occur l What attributes can/must an element have l What subelements can/must occur inside each element, and how many times. n DTD does not constrain data types l All values represented as strings in XML n DTD syntax l

14 Element Specification in DTD n Subelements can be specified as l names of elements, or l #PCDATA (parsed character data), i.e., character strings l EMPTY (no subelements) or ANY (anything can be a subelement) n Example n Subelement specification may have regular expressions  Notation: – “|” - alternatives – “+” - 1 or more occurrences – “*” - 0 or more occurrences

15 University DTD ]>

16 Attribute Specification in DTD n Attribute specification : for each attribute l Name l Type of attribute  CDATA  ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) – more on this later l Whether  mandatory (#REQUIRED)  has a default value (value),  or neither (#IMPLIED) n Examples l, or l <!ATTLIST course course_id ID #REQUIRED dept_name IDREF #REQUIRED instructors IDREFS #IMPLIED >

17 IDs and IDREFs n An element can have at most one attribute of type ID n The ID attribute value of each element in an XML document must be distinct l Thus the ID attribute value is an object identifier n An attribute of type IDREF must contain the ID value of an element in the same document n An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

18 University DTD with Attributes n University DTD with ID and IDREF attribute types. · · · declarations for title, credits, building, budget, name and salary · · · ]>

19 XML Data with ID and IDREF attributes Taylor 100000 Watson 90000 <course course id=“CS-101” dept name=“Comp. Sci” instructors=“10101 83821”> Intro. to Computer Science 4 …. Srinivasan 65000 ….

20 Limitations of DTDs n No typing of text elements and attributes l All values are strings, no integers, reals, etc. n Difficult to specify unordered sets of subelements l Order is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved) l (A | B)* allows specification of an unordered set, but  Cannot ensure that each of A and B occurs only once n IDs and IDREFs are untyped l The instructors attribute of an course may contain a reference to another course, which is meaningless  instructors attribute should ideally be constrained to refer to instructor elements

21 XML Schema n XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports l Typing of values  E.g. integer, string, boolean, etc  Also, constraints on min/max values l User-defined, complex types l Many more features, including  uniqueness and foreign key constraints, inheritance n XML Schema is itself specified in XML syntax, unlike DTDs l More-standard representation, but verbose n XML Schema is integrated with namespaces n BUT: XML Schema is significantly more complicated than DTDs.

22 XML Schema Version of Univ. DTD …. … Contd.

23 XML Schema Version of Univ. DTD (Cont.) …. n Choice of “xs:” was ours -- any other namespace prefix could be chosen n Element “university” has type “universityType”, which is defined separately l xs:complexType is used later to create the named complex type “UniversityType”

24 More Features of XML Schema n Attributes specified by xs:attribute tag: l l adding the attribute use = “required” means value must be specified, default value of attribute use is “optional” n Key constraint: “department names form a key for department elements under the root university element: n Foreign key constraint from course to department:

25 XML Schema n XML Schema offers several benefits over DTDs l It allows the text that appears in elements to be constrained to specific types (numeric, complex, etc.) l It allows user-defined types to be created l It allows uniqueness and foreign-key constraints l It is integrated with namespaces to allow different parts of a document to conform to different schemas. n Other features we haven’t seen that XML Schema has over DTDs l It allows types to be restricted to specialized types, for instance by specifying minimum and maximum values. l It allows complex types to be extended by using a form of inheritance.


Download ppt "CS 480: Database Systems Lecture 26 March 18, 2013."

Similar presentations


Ads by Google