Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML – Data Model, DTD and Schema

Similar presentations


Presentation on theme: "XML – Data Model, DTD and Schema"— Presentation transcript:

1 XML – Data Model, DTD and Schema
ADVANCED DATABASES XML – Data Model, DTD and Schema Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus) Content for this lecture is taken from: Chapter 11 of “Database Systems: Models, Languages …”, 6th Ed.” by Elmasri and Navathe (Chapter 12 of “Fundamentals of Database Systems” 6th Ed. by Elmasri and Navathe)

2 Topics to Cover Structured, Semi-structured, and Unstructured Data
XML Hierarchical (Tree) Data Model XML Documents XML DTDs XML Schema Storing and Extracting XML Documents from Databases

3 XML: Extensible Markup Language
Data sources Databases storing data for Internet applications Hypertext documents Common method of specifying contents and formatting of Web pages Static Web Pages Vs. Dynamic Web Pages XML data model based on tree (hierarchical) structures as compared to the flat relational data model structures data extracted from relational databases can be formatted as XML documents to be exchanged over the Web

4 Structured, Semi-structured, and Unstructured Data
Represented in a strict format Example: information stored in databases Semi-structured data May have a certain structure but not all information collected will have identical structure No predefined schema Schema information mixed in with data values, since each data object can have different attributes that are not known in advance.

5 Structured, Semi-structured, and Unstructured Data (cont’d.)
Semi-structured data (contd.) Also referred to as Self-describing data May be displayed as a directed graph Labels or tags on directed edges represent: Schema names Names of attributes Object types (or entity types or classes) Relationships Internal nodes represent individual objects or composite attributes. Leaf nodes represent actual data values of simple (atomic) attributes.

6 Structured, Semi-structured, and Unstructured Data (cont’d.)

7 Structured, Semi-structured, and Unstructured Data (cont’d.)
Very limited indication of the type of data Example: text document that contains information embedded within it HTML tag Text that appears between angled brackets: <...> End tag Tag with a slash: </...>

8 Structured, Semi-structured, and Unstructured Data (cont’d.)
HTML uses a large number of predefined tags HTML documents Do not include schema information about type of data Static HTML page All information to be displayed explicitly spelled out as fixed text in HTML file

9

10 XML Hierarchical (Tree) Data Model
Elements and attributes Main structuring concepts used to construct an XML document Simple elements Contain data values Complex elements Constructed from other elements hierarchically XML tag names Describe the meaning of the data elements in the document

11

12 XML Hierarchical (Tree) Data Model (cont’d.)
Called Tree model or hierarchical model Three Main types of XML documents Data-centric XML documents have many small data items that follow a specific structure and hence may be extracted from a structured database. formatted as XML documents in order to exchange them over or display them on the Web. usually follow a predefined schema that defines the tag names Document-centric XML documents documents with large amounts of text, such as news articles or books. Contain few or no structured data elements

13 XML Hierarchical (Tree) Data Model (cont’d.)
Hybrid XML documents may have parts that contain structured data and other parts that are predominantly textual or unstructured. may or may not have a predefined schema Schemaless XML documents Do not follow a predefined schema of element names and corresponding tree structure Semi-structured The value of the standalone attribute in an XML document is yes <?xml version= “1.0” standalone=“yes”?>

14 XML Hierarchical (Tree) Data Model (cont’d.)
XML attributes Describe properties and characteristics of the elements (tags) within which they appear possible to use for holding values of simple data elements; however, this is generally not recommended May reference another element in another part of the XML document Common to use attribute values in one element as the references. This resembles the concept of foreign keys in relational databases

15 XML Documents, DTD, and XML Schema
Well formed XML Documents Has XML declaration Indicates version of XML being used as well as any other relevant attributes Must follow the syntactic guidelines of the tree data model Should have a single root element Every element must include a matching pair of start and end tags within the start and end tags of parent element Can be processed by generic processors that traverse the document and create an internal tree representation Well formed XML documents can be schemaless

16 XML Documents, DTD, and XML Schema (cont’d.)
DOM (Document Object Model) A standard model with an associated set of API Allows programs to manipulate the resulting tree representation corresponding to a well-formed XML document Whole document must be parsed before hand to convert the document to standard DOM internal data structure representation

17 XML Documents, DTD, and XML Schema (cont’d.)
SAX (Simple API for XML) Another API for processing of XML documents on the fly Notifies processing program through callbacks whenever a start or end tag is encountered Makes it easier to process large documents Allows for processing of streaming XML documents process the tags as they are encountered also known as event-based processing

18 XML Documents, DTD, and XML Schema (cont’d.)
Valid XML Documents Document must be well formed and it must follow a particular schema Start and end tag pairs must follow the structure specified in separate XML DTD (Document Type Definition) file or XML schema file

19 XML Documents, DTD, and XML Schema (cont’d.)
XML DTD Data types in DTD are not very general Special syntax Requires specialized processors All DTD elements always forced to follow the specified ordering of the document Unordered elements not permitted

20 XML DTD First, name of root tag
Then, elements and their nested structure * after element name means element can be repeated zero or more times + after element name means element can be repeated one or more times ? after element name means element can be repeated zero or one time No symbol after element name means, must appear exactly once

21 XML DTD (cont’d.) Type of element is specified via parentheses following the element Parentheses may include names of the children of the element #PCDATA or other data types in parenthesis means a leaf node PCDATA (Parsed Character Data) is similar to a string data type The list of attributes can be specified via the keyword !ATTLIST The ID type of an attribute means it can be referenced from another attribute whose type is IDREF within another element Attributes can also be used to hold the values of simple data elements of type #PCDATA Parentheses can be nested when specifying elements A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the document

22 XML DTD (cont’d.) <?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”> standalone=“no” means the document needs to be checked against a separate DTD document or XML schema document The separate DTD document named "proj.dtd" should be stored in the same file system as the XML document Alternatively, we could include the DTD document text at the beginning of the XML document itself XML DTD has several limitations: data types are not very general has its own special syntax and thus requires specialized processors all DTD elements are always forced to follow the specified ordering of the document These drawbacks led to the development of XML schema

23 XML Schema XML schema language
Standard for specifying the structure of XML documents Uses same syntax rules as regular XML documents Same processors can be used on both As with XML DTD, XML schema is based on tree data model, with elements and attributes as the main structuring concepts Borrows additional concepts from database and object models, such as keys, references, and identifiers

24

25

26 XML Schema (cont’d.) XML schema concepts:
XML Descriptions and XML namespaces <xsd:schema xmlns:xsd=“ identifies the specific set of XML schema language elements (tags) being used by specifying a file stored at a Web site location A commonly used standard for XML schema commands Each such definition is called an XML namespace, because it defines the set of commands (names) that can be used File name is assigned to the variable xsd (XML schema description) using the attribute xmlns (XML namespace), and this variable is used as a prefix to all XML schema commands (tag names) xsd:element or xsd:sequence used later refers to the definitions of the element and sequence tags as defined in the file

27 XML Schema (cont’d.) Annotations, documentation, language used
xsd:annotation and xsd:documentation are used for providing comments and other descriptions in the XML document. The attribute xml:lang of the xsd:documentation element specifies the language being used, where en stands for the English language. Elements and types the name attribute of the xsd:element tag specifies the element name, which is called company for the root element in our example The structure of the company root element is specified in our example as xsd:complexType xsd:sequence structure of XML schema is used to further specify a sequence of departments, employees and projects

28 XML Schema (cont’d.) First level elements
Elements named employee, department, and project are first level elements and each is specified in an xsd:element tag If a tag has only attributes and no further subelements or data within it, it can be ended with the backslash symbol (/>) directly instead of having a separate matching end tag. It is called empty element. Element types, minOccurs, and maxOccurs specify the type and multiplicity of each element in any document that conforms to the schema specifications When specified as a type attribute in an xsd:element, the structure of the element must be described separately, typically using the xsd:complexType element of XML schema. Examples: employee, department, and project elements

29 XML Schema (cont’d.) Keys
If no type attribute is specified, the element structure can be defined directly following the tag, example: company root element The minOccurs and maxOccurs tags are used for specifying lower and upper bounds similar to the *, +, and ? symbols of XML DTD. If they are not specified, the default is exactly one occurrence. Keys xsd:unique for specifying unique attributes xsd:selector to identify the element type that contains the unique element xsd:field to identify the element name within it that is unique. Examples: departmentNameUnique and projectNameUnique xsd:key for specifying primary keys. Examples: projectNumberKey, departmentNumberKey xsd:keyref for specifying foreign keys Example: departmentManagerSSNKeyRef

30 XML Schema (cont’d.) Structures of complex elements
xsd:complexType specifies the structures of the complex elements Examples: Department, Employee, Project, and Dependent If no key constraints, subelements can be embedded within parent element definition Composite attributes Also specified as complex types Examples: Address, Name, Worker and WorksOn These could have been directly embedded within their parent elements

31 Storing and Extracting XML Documents from Databases
Most common approaches for storing and extracting Using a DBMS to store the documents as text Relational or object DBMS can be used to store whole XML documents as text fields within the DBMS records or objects Can be used if DBMS has a special module for document processing Would work for storing schemaless and document-centric XML documents

32 Storing and Extracting XML Documents from Databases
Using a DBMS to store document contents as data elements Would work for storing a collection of documents that follow a specific XML DTD or XML schema. Since documents' structure is same, a relational (or object) database can be designed to store the leaf-level data elements within the XML documents. Would require mapping algorithms to design a database schema that is compatible with the XML document structure as specified in the XML schema or DTD And to recreate the XML documents from the stored data. These algorithms can be implemented either as an internal DBMS module or as separate middleware that is not part of the DBMS.

33 Storing and Extracting XML Documents from Databases (cont’d.)
Designing a specialized system for storing native XML data Based on the hierarchical (tree) model Such systems are being called Native XML DBMSs Would include specialized indexing and querying techniques, and would work for all types of XML documents. Could also include data compression techniques to reduce the size of the documents for storage. Examples of popular products offering native XML DBMS capability : Tamino by Software AG Dynamic Application Platform of eXcelon Oracle also offers a native XML storage option

34 Storing and Extracting XML Documents from Databases (cont’d.)
Creating or publishing customized XML documents from preexisting relational databases Since enormous amounts of data is already stored in relational databases, parts of this data may need to be formatted as documents for exchanging or displaying over the Web. This approach would use a separate middleware software layer to handle the conversions needed between the XML documents and the relational database.

35 Conclusion Three main types of data: structured, semi- structured, and unstructured XML standard Tree-structured (hierarchical) data model XML documents and the languages for specifying the structure of these documents There are several options for storing and extracting XML documents from databases


Download ppt "XML – Data Model, DTD and Schema"

Similar presentations


Ads by Google