2 XML and Internet Databases Chapter 26XML and Internet Databases
3 Outline Structured, Semistructured, & Unstructured Data XML Hierarchical Data ModelXML Document, DTD, & XML SchemaXML Documents & DatabasesXML Querying
4 Structured vs Semistructured Data e.g., information stored in databases; all records have the same format as defined in the relational schemaSemistructured data may have a certain structure but no all the information collected will have identical structure.
5 FIGURE 26.1 Representing semistructured data as a graph.
6 FIGURE 26.2 Part of an HTML document representing unstructured data (c.f., the company database schema)
7 XML Hierarchical (Tree) Data Model Problem with HTML document:Difficult to interpret automatically by programs because they do not include schema information about the type of data in the documentsInappropriate as intermediate Web documents to be exchanged among various computer sitesSolution XML documentsTwo main structuring concepts: elements, attributesc.f., In XML, tag names are defined to describe the meaning of the data elements, rather than to describe how the text is to be displayed (as in HTML).
8 FIGURE 26.3 A complex XML element called <projects>. Standalone=“yes”- schemalessCorrection: <project>Complex elements: <projects>, <project>, <Worker>Simple elements: <Name>, <Number>, <SSN>, …
9 XML Documents, DTD, and XML Schema A well-formed XML document is one that follows a few conditions.Start with an XML declaration (version, …)Tree modelA single root elementMatching start and end tags for an element must be within the tags of the parent elementSyntactically correct
10 XML Documents, DTD, and XML Schema A valid XML document is well formed, and in addition the element names used in the start and end tag pairs must follow the structure specified in a separate XML DTD (Document Type Definition) file or XML schema file.Figure 26.4: a sample XML DTD called projects* Zero or more, + one or more, ? Zero or oneOtherwise: exactly once(data type)(#PCDATA) parsed character data
11 FIGURE 26.4 An XML DTD file called projects To use the DTD file:Store the DTD file in the same file system as the XML document<?xml version=“1.0” standalone=“no”?><!DOCTYPE projects SYSTEM “proj.dtd”>
12 DTD Limitations Data types in DTD are not very general Has its own special syntax and thus requires specialized processorsAll DTD elements are always forced to follow the specified ordering of the documents, so unordered elements are not permitted.Solution XML Schema
13 FIGURE 26.5 An XML schema file called company Schema namespace the root element company; also an unnamed complex element“Department”, “Employee”, etc. must be named types.The selector “employeeDependent” is an attribute of “Employee”, of type “Dependent”.The field “dependentName” in “Dependent” must be unique.
14 FIGURE 26.5 (continued) An XML schema file called company. <xsd:uniqu …> specifies a key constraint for non-primary key element.<xsd:key> specifies a primary key.<xsd:keyref> specifies a foreign key; <xsd:selector> refers to the referencing element type; <xsd:field> refers to the referencing attribute.
15 FIGURE 26.5 (continued) An XML schema file called company Exercise: Define the element “projectWorker” in the type “Project” as an embedded sub-element.Answer:<xsd:element name=“projectWorker” minOccurs=“1” maxOccurs=“unbound”><xsd:sequence><xsd:element name=“SSN” type=“xsd:string” /><xsd:element name=“hours” type=“xsd:float” /></xsd:sequence></xsd:element>
16 FIGURE 26.5 (continued) An XML schema file called company
17 XML Documents and Databases Approaches to Storing XML DocumentsExtracting XML Documents from Relational DatabasesBreaking Cycles to Convert Graphs into TreesOther Steps for Extracting XML Documents from Databases
18 FIGURE 26.6 An ER schema diagram for a simplified UNIVERSITY database.
19 FIGURE 26.7 Subset of the UNIVERSITY database schema needed for XML document extraction.
20 FIGURE 26.8 Hierarchical (tree) view with COURSE as the root.
21 FIGURE 26.9 XML schema document with COURSE as the root.
22 FIGURE 26.10 Hierarchical (tree) view with STUDENT as the root.
23 FIGURE 26.11 XML schema document with STUDENT as the root.
24 FIGURE 26.12 Hierarchical (tree) view with SECTION as the root.
25 FIGURE 26.13 Converting a graph with cycles into a hierarchical (tree) structure.
26 XML Query XPath: Specifying Path Expressions in XML XQuery: Specifying Queries in XML
27 FIGURE Some examples of XPath expressions on XML documents that follow the XML schema file COMPANY in Figure 26.5
28 FIGURE Some examples of XQuery queries on XML documents that follow the XML schema file COMPANY in Figure 26.5.