DTD (Document Type Definition)

Slides:



Advertisements
Similar presentations
Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
Advertisements

1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
Fall 2001 CSE3301 XML and Beyond: Parts I and II
XML Study-Session: Part II Validating XML Documents.
XML Examples CSC 436 – Fall 2005 Slides to be used in conjunction with class notes.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
1 Document Type Descriptors (DTDs) Imposing Structure on XML Documents.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML eXtensible Markup Language.
1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
1 XML – Extensible Markup Language DBI – Representation and Management of Data on the Internet.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
1 XML Data Management Document Type Definitions (DTDs) Werner Nutt.
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
1 XML Data Management 3. Document Type Definitions (DTDs) Werner Nutt based on slides by Sara Cohen, Jerusalem.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
XML eXtensible Markup Language Part 2.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Example Write the DTD rules for the following XML fragment. Kim 34 South Street NY USA Vice President $175,000 1.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 4. Document Type Definitions (DTDs)
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
XML – A Quick Introduction Kerry Raymond (stolen from others)
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
XML DTD. XML Validation XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid" XML.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
Copyrighted material John Tullis 3/18/2016 page 1 04/29/00 XML Part 4 John Tullis DePaul Instructor
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
XML eXtensible Markup Language.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Document Type Definition DTDs
Managing XML and Semistructured Data
Web Programming Maymester 2004
XML Data DTDs, IDs & IDREFs.
New Perspectives on XML
eXtensible Markup Language
Document Type Definition (DTD)
Presentation transcript:

DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)

Motivation A DTD adds syntactical requirements in addition to the well-formed requirement It helps in eliminating errors when creating or editing XML documents It clarifies the intended semantics It simplifies the processing of XML documents

An Example In an address book, where can a phone number appear? Under <person>, under <name> or under both? If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs

Document Type Definitions Document Type Definitions (DTDs) impose structure on XML documents There is some relationship between a DTD and a schema, but it is not close – hence the need for additional “typing” systems (XML schemas) The DTD is a syntactic specification

Example: An Address Book <person> <name> Homer Simpson </name> <greet> Dr. H. Simpson </greet> <addr>1234 Springwater Road </addr> <addr> Springfield USA, 98765 </addr> <tel> (321) 786 2543 </tel> <fax> (321) 786 2544 </fax> <tel> (321) 786 2544 </tel> <email> homer@math.springfield.edu </email> </person> Exactly one name At most one greeting As many address lines as needed (in order) Mixed telephones and faxes As many as needed

Specifying the Structure name to specify a name element greet? to specify an optional (0 or 1) greet elements name, greet? to specify a name followed by an optional greet

Specifying the Structure (cont’d) addr* to specify 0 or more address lines tel | fax a tel or a fax element (tel | fax)* 0 or more repeats of tel or fax email* 0 or more email elements

Specifying the Structure (cont’d) So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email* This is known as a regular expression

Element Type Definition for each element type E, a declaration of the form: <!ELEMENT E P> where P is a regular expression, i.e., P ::= EMPTY | ANY | #PCDATA | E’ | P1, P2 | P1 | P2 | P? | P+ | P* E’: element type P1 , P2: concatenation P1 | P2: disjunction P?: optional P+: one or more occurrences P*: the Kleene closure

Summary of Regular Expressions A The tag (i.e., element) A occurs e1,e2 The expression e1 followed by e2 e* 0 or more occurrences of e e? Optional: 0 or 1 occurrences e+ 1 or more occurrences e1 | e2 either e1 or e2 (e) grouping

The Definition of an Element Consists of Exactly One of the Following A regular expression (as defined earlier) EMPTY means that the element has no content ANY means that content can be any mixture of PCDATA and elements defined in the DTD Mixed content which is defined as described on the next slide (#PCDATA)

The Definition of Mixed Content Mixed content is described by a repeatable OR group (#PCDATA | element-name | …)* Inside the group, no regular expressions – just element names #PCDATA must be first followed by 0 or more element names, separated by | The group can be repeated 0 or more times

An Address-Book XML Document with an Internal DTD “Internal” means that the DTD and the XML Document are in the same file <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, greet?, address*, (fax | tel)*, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT greet (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT fax (#PCDATA)> <!ELEMENT email (#PCDATA)> ]> The name of the DTD is addressbook The syntax of a DTD is not XML syntax

The Rest of the Address-Book XML Document <person> <name> Jeff Cohen </name> <greet> Dr. Cohen </greet> <email> jc@penny.com </email> </person> </addressbook>

Regular Expressions name, addr*, email Each regular expression determines a corresponding finite-state automaton Let’s start with a simpler example: name, addr*, email A double circle denotes an accepting state addr name email This suggests a simple parsing program

name,address*,(tel | fax)*,email* Another Example name,address*,(tel | fax)*,email* address email tel tel name email fax email fax

Some Things are Hard to Specify Each employee element should contain name, age and ssn elements in some order <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )> Suppose that there were many more fields!

Some Things are Hard to Specify (cont’d) <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )> Suppose there were many more fields! There are n! different orders of n elements It is not even polynomial

Specifying Attributes in the DTD <!ELEMENT height (#PCDATA)> <!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED > The dimension attribute is required The accuracy attribute is optional CDATA is the “type” of the attribute – it means “character data,” and may take any literal string as a value

The Format of an Attribute Definition <!ATTLIST element-name attr-name attr-type default-value> The default value is given inside quotes attribute types: CDATA ID, IDREF, IDREFS …

Summary of Attribute Default Values #REQUIRED means that the attribute must by included in the element #IMPLIED #FIXED “value” The given value (inside quotes) is the only possible one “value” The default value of the attribute if none is given

Recursive DTDs Each person should have a father and a mother. This <DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> <!ELEMENT person ( name, dateOfBirth, person, -- mother person )> -- father ... ]> What is the problem with this? A parser does not notice it! Each person should have a father and a mother. This leads to either infinite data or a person that is a descendent of herself.

Recursive DTDs (cont’d) <DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> <!ELEMENT person ( name, dateOfBirth, person?, -- mother person? )> -- father ... ]> What is now the problem with this? If a person only has a father, how can you tell that he has a father and does not have a mother?

Using ID and IDREF Attributes <!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>

IDs and IDREFs ID attribute: unique within the entire document. An element can have at most one ID attribute. No default (fixed default) value is allowed. #required: a value must be provided #implied: a value is optional IDREF attribute: its value must be some other element’s ID value in the document. IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document. <person id=“898” father=“332” mother=“336” children=“982 984 986”>

Some Conforming Data <family> <person id=“lisa” mother=“marge” father=“homer”> <name> Lisa Simpson </name> </person> <person id=“bart” mother=“marge” father=“homer”> <name> Bart Simpson </name> <person id=“marge” children=“bart lisa”> <name> Marge Simpson </name> <person id=“homer” children=“bart lisa”> <name> Homer Simpson </name> </family>

ID References do not Have Types The attributes mother and father are references to IDs of other elements However, those are not necessarily person elements! The mother attribute is not necessarily a reference to a female person

An Alternative Specification <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name, mother?, father?, children?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT mother EMPTY> <!ATTLIST mother idref IDREF #REQUIRED> <!ELEMENT father EMPTY> <!ATTLIST father idref IDREF #REQUIRED> <!ELEMENT children EMPTY> <!ATTLIST children idrefs IDREFS #REQUIRED> ]>

The Revised Data <family> <person id="marge"> <name> Marge Simpson </name> <children idrefs="bart lisa"/> </person> <person id="homer"> <name> Homer <person id="bart"> <name> Bart Simpson </name> <mother idref="marge"/> <father idref="homer"/> </person> <person id="lisa"> <name> Lisa Simpson </name> </family>

Consistency of ID and IDREF Attribute Values If an attribute is declared as ID The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion) Even if the two elements have different element names If an attribute is declared as IDREF The associated value must exist as the value of some ID attribute (no dangling “pointers”) Similarly for all the values of an IDREFS attribute ID, IDREF and IDREFS attributes are not typed

Adding a DTD to the Document A DTD can be internal The DTD is part of the document file or external The DTD and the document are on separate files An external DTD may reside In the local file system (where the document is) In a remote file system

Connecting a Document with its DTD An internal DTD: <?xml version="1.0"?> <!DOCTYPE db [<!ELEMENT ...> … ]> <db> ... </db> A DTD from the local file system: <!DOCTYPE db SYSTEM "schema.dtd"> A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">

Well-Formed XML Documents An XML document (with or without a DTD) is well-formed if Tags are syntactically correct Every tag has an end tag Tags are properly nested There is a root tag A start tag does not have two occurrences of the same attribute An XML document must be well formed

Valid Documents A well-formed XML document isvalid if it conforms to its DTD, that is, The document conforms to the regular-expression grammar, The types of attributes are correct, and The constraints on references are satisfied