Presentation on theme: "1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs."— Presentation transcript:
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs
2 Motivation A DTD adds syntactical requirements in addition to the well-formed requirement It helps in eliminating errors when creating or editing XML documents It clarifies the intended semantics It simplifies the processing of XML documents
3 An Example In an address book, where can a phone number appear? –Under, under or under both? If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs
4 Document Type Definitions Document Type Definitions (DTDs) impose structure on XML documents There is some relationship between a DTD and a schema, but it is not close – hence the need for additional “typing” systems (XML schemas) The DTD is a syntactic specification
5 Example: An Address Book Homer Simpson Dr. H. Simpson 1234 Springwater Road Springfield USA, 98765 (321) 786 2543 (321) 786 2544 firstname.lastname@example.org Mixed telephones and faxes As many as needed As many address lines as needed (in order) At most one greetingExactly one name
6 Specifying the Structure name to specify a name element greet? to specify an optional (0 or 1) greet elements name, greet? to specify a name followed by an optional greet
7 Specifying the Structure (cont’d) addr*to specify 0 or more address lines tel | faxa tel or a fax element (tel | fax)* 0 or more repeats of tel or fax email*0 or more email elements
8 Specifying the Structure (cont’d) So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email* This is known as a regular expression
9 Element Type Definition for each element type E, a declaration of the form: where P is a regular expression, i.e., P ::= EMPTY | ANY | #PCDATA | E’ | P1, P2 | P1 | P2 | P? | P+ | P* –E’: element type –P1, P2: concatenation –P1 | P2: disjunction –P?: optional –P+: one or more occurrences – P*: the Kleene closure
10 Summary of Regular Expressions AThe tag (i.e., element) A occurs e1,e2The expression e1 followed by e2 e*0 or more occurrences of e e?Optional: 0 or 1 occurrences e+1 or more occurrences e1 | e2either e1 or e2 (e)grouping
11 The Definition of an Element Consists of Exactly One of the Following A regular expression (as defined earlier) EMPTY means that the element has no content ANY means that content can be any mixture of PCDATA and elements defined in the DTD Mixed content which is defined as described on the next slide (#PCDATA)
12 The Definition of Mixed Content Mixed content is described by a repeatable OR group (#PCDATA | element-name | …)* –Inside the group, no regular expressions – just element names –#PCDATA must be first followed by 0 or more element names, separated by | –The group can be repeated 0 or more times
13 An Address-Book XML Document with an Internal DTD ]> The name of the DTD is addressbook “Internal” means that the DTD and the XML Document are in the same file The syntax of a DTD is not XML syntax
14 The Rest of the Address-Book XML Document Jeff Cohen Dr. Cohen email@example.com
15 Regular Expressions Each regular expression determines a corresponding finite-state automaton Let’s start with a simpler example: name, addr*, email name addr email This suggests a simple parsing program A double circle denotes an accepting state
16 Another Example name,address*,(tel | fax)*,email* name address tel fax email
17 Some Things are Hard to Specify Each employee element should contain name, age and ssn elements in some order Suppose that there were many more fields!
18 Some Things are Hard to Specify (cont’d) Suppose there were many more fields! There are n! different orders of n elements It is not even polynomial
19 Specifying Attributes in the DTD The dimension attribute is required The accuracy attribute is optional CDATA is the “type” of the attribute – it means “character data,” and may take any literal string as a value
20 The Format of an Attribute Definition The default value is given inside quotes attribute types: –CDATA –ID, IDREF, IDREFS –…
21 Summary of Attribute Default Values #REQUIRED means that the attribute must by included in the element #IMPLIED #FIXED “value” –The given value (inside quotes) is the only possible one “value” –The default value of the attribute if none is given
22 Recursive DTDs -- father... ]> What is the problem with this? A parser does not notice it! Each person should have a father and a mother. This leads to either infinite data or a person that is a descendent of herself.
23 Recursive DTDs (cont’d) -- father... ]> What is now the problem with this? If a person only has a father, how can you tell that he has a father and does not have a mother?
25 IDs and IDREFs ID attribute: unique within the entire document. –An element can have at most one ID attribute. –No default (fixed default) value is allowed. #required: a value must be provided #implied: a value is optional IDREF attribute: its value must be some other element’s ID value in the document. IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document.
26 Some Conforming Data Lisa Simpson Bart Simpson Marge Simpson Homer Simpson
27 ID References do not Have Types The attributes mother and father are references to IDs of other elements However, those are not necessarily person elements! The mother attribute is not necessarily a reference to a female person
29 The Revised Data Marge Simpson Homer Simpson Bart Simpson Lisa Simpson
30 Consistency of ID and IDREF Attribute Values If an attribute is declared as ID –The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion) Even if the two elements have different element names If an attribute is declared as IDREF –The associated value must exist as the value of some ID attribute (no dangling “pointers”) Similarly for all the values of an IDREFS attribute ID, IDREF and IDREFS attributes are not typed
31 Adding a DTD to the Document A DTD can be internal –The DTD is part of the document file or external –The DTD and the document are on separate files –An external DTD may reside In the local file system (where the document is) In a remote file system
32 Connecting a Document with its DTD An internal DTD: … ]>... A DTD from the local file system: A DTD from a remote file system:
33 Well-Formed XML Documents An XML document (with or without a DTD) is well-formed if –Tags are syntactically correct –Every tag has an end tag –Tags are properly nested –There is a root tag –A start tag does not have two occurrences of the same attribute An XML document must be well formed
34 Valid Documents A well-formed XML document isvalid if it conforms to its DTD, that is, –The document conforms to the regular- expression grammar, –The types of attributes are correct, and –The constraints on references are satisfied