4 The structure of the order.xml document customers+customercustID [custType]name[title]The customers musthave at least onecustomer childA customer must have a custID, name, address, phone, and may have a custType, title,addressphoneAn orders element is used to group one or more separate order placed by a customer?orders+The orders must haveat least one order childorderorderID orderByorderDateitemsThe items must haveat least one item child+itemitemPrice [itemQty]
6 DTD and A Valid Document An XML document can be validated using either DTDs (Document Type Definitions) or schemas.A DTD is a collection of rules that define the content and structure of an XML document.A DTD can be used to:enforce a specific data structureensure all required elements are presentprevent undefined elements from being usedspecify the use of attributes and define their possible values
7 Declaring a DTDA DTD is declared in a DOCTYPE statement. It has to be added to the document prolog, after the XML declaration and before the document's root element.While there can only be one DTD per XML document, it can be divided into two parts:An internal subset is placed within the same XML document.An external subset is located in a separate file.
8 To declare an internal DTD subset <!DOCTYPE document’s root[declarations]>An example:<!DOCTYPE customers
9 To declare an external DTD subset External subsets have two types of locations: system and public. For a system DTD,<!DOCTYPE root SYSTEM “uri_ExternalFile”>An example:<!DOCTYPE customers SYSTEM "rules.dtd">
10 To declare an external DTD subset The syntax of the DOCTYPE declaration using a public identifier:<!DOCTYPE root PUBLIC “id” “uri” >Where id is public identifier acting like the namespace URIAn example:<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "
11 Using External & Internal DTDs The real power of XML comes from an external DTD that can be shared among many documents.If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two.This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.
13 Declaring Document Elements In a valid document, every element must be declared in the DTD.The syntax of an element declaration is:<!ELEMENT element content-model>where element is the name of the element and content- model specifies what type of content the element contains.The element name is case sensitive
14 Five values for content-model ANY - No restrictions on the element’s contentEMPTY - The element cannot store any content#PCDATA - The element can only contain parsed character dataElements - The element can only contain child elementsMixed - The element contains both parsed character data and child elements
15 <!ELEMENT element ANY> An example: <!ELEMENT product ANY>All of the following satisfy the above declaration:<product>PLBK70 Painted Lady Breeding Kit</product><product type = "Painted Lady Breeding Kit" /><product> <name>PLBK70</name> <type> Painted Lady Breeding Kit</type> </product>
16 <!ELEMENT element EMPTY> An example: <!ELEMENT img EMPTY>The following would satisfy the above declaration:<img />
17 <!ELEMENT element (#PCDATA)> An example <!ELEMENT name (#PCDATA)> would permit the following element in an XML document:<name>Lea Ziegler</name>PCDATA element may contain plain text. The "parsed" part of it means that markup in it is parsed instead of displayed as raw text. It also means that entity references are replaced.PCDATA element does not allow for child elements
18 <!ELEMENT parent (children)> <!ELEMENT customer (phone)>The customer element can contain only a single child element, named phone.The following would be invalid:<customer> <name>Lea Ziegler</name> <phone> </phone> </customer>
19 <!ELEMENT customer (name, phone, email)> Specifying an element sequence <!ELEMENT parent (child1, child2, . .)>child1, child2, . . is the order in which the child elements must appear within the parent element<!ELEMENT customer (name, phone, )>indicates the document below is invalid: <customer> <name>Lea Ziegler</name> <phone>(813) </phone> </customer>
20 Specifying an element choice <. ELEMENT parent (child1 | child2 | child1, child2 are the possible child elements of the parent element<!ELEMENT customer (name | company)>allows the customer element to contain either the name element or the company element.<!ELEMENT customer ((name | company), phone, )>indicates that the customer element must have three child elements
21 Modifying SymbolsDTDs use a modifying symbol to specify the number of occurrences of each element? allows zero or one of the item+ allows one or more of the item* allows zero or more of the itemIf you want to specify that an element contain exactly three child elements you have to enter the sequence child child child into the declaration
22 Modifying Symbols<!ELEMENT customers (customer+)> the customers element must contain at least one element named customer<!ELEMENT order (orderDate, items)+> the (orderDate, items) sequence can be repeated one or more times within each order element<!ELEMENT customer (name, address, phone, ?, orders)>the customer element contains zero or one element
23 Declaring child elements customers element can contain one or more customer elementscustomer element has the following child elements: name, address, phone (optional), and orders
24 Working with Mixed Content Mixed content elements contain both parsed character data and child elements. The syntax is:<!ELEMENT parent (#PCDATA | child1 | child2 | … )*>The parent element can contain character data or any number of the specified child elements, or it can contain no content at all.It is better not to work with mixed content if you want a tightly structured document.
26 Declaring Element Attributes Add an attribute-list declaration to the document’s DTD to accomplish the following:lists the names of all of the attributes associated with a specific elementspecifies the data type of each attributeindicates whether each attribute is required or optionalprovides a default value for each attribute, if necessary
27 Attributes used in orders.xml ElementAttributesRequiredDefault Value(s)customercustIDcustTypeYesNoNone“home”, “school”, or “businessnameTitle“Mr.”, “Mrs.”, “Ms.”orderorderIDorderByitemitemPriceitemQty“1”
28 Declaring Attributes in a DTD <!ATTLIST element attribute1 type1 default1attribute2 type2 default2attribute3 type3 default3 … >or<!ATTLIST element attribute1 type1 default1 ><!ATTLIST element attribute2 type2 default2 ><!ATTLIST element attribute3 type3 default3 >element is the name of the element associated with the attributesattribute is the name of an attributetype is the attribute’s data typedefault indicates whether the attribute is required and whether it has a default value
29 Declaring Attribute Names Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated
30 Attribute Types Attribute values can consist only of character data, but you can control the format of those charactersCDATA - character dataEnumerated list - a list of possible attribute valuesID - A unique text stringIDREF - A reference to an ID valueENTITY - a reference to an external unparsed entityENTITIES - a list of entities separated by white spaceNMTOKEN - an accepted XML nameNMTOKENS - a list of XML names separated by white space
31 CDATA can contain any character except those reserved by XML <!ATTLIST element attribute CDATA default><!ATTLIST item itemPrice CDATA><!ATTLIST item itemQty CDATA>Any of the following attributes values are allowed: <item itemPrice=“29.95”> </item> <item itemPrice=“$29.95”> </item> <item itemPrice=“£29.95”> </item>
32 Enumerated Types: Attributes that are limited to a set of possible values <!ATTLIST element attribute (value1 | value2 | value3 | . . ) default > where value1, value2, . . are allowed values<!ATTLIST customer custType (home | business | school)>any custType attribute whose value is not “home”, “school”, or “business” causes parsers to reject the document as invalid
33 Tokenized Typesare character strings that follow certain rules (known as tokens) for format & contentDTDs support four kinds of tokens: IDs, ID references, name tokens, and entities
34 <!ATTLIST customer custID ID> ID Token is used when an attribute value must be unique within the document<!ATTLIST customer custID ID>This declaration ensures each customer will have a unique IDThe following elements would not be valid because the same custID value is used more than once:<customer custID="Cust021"> ... </customer> <customer custID="Cust021"> ... </customer>
35 <!ATTLIST element attribute IDREF default> An attribute declared as an IDREF token must have a value equal to the value of an ID attribute located somewhere in the same document. This enables an XML document to contain cross-references between one element and another.<!ATTLIST order orderBy IDREF>When an XML parser encounters this attribute, it searches the XML document for an ID value that matches the value of the orderBy attribute. If it doesn't find one, it rejects the document as invalid.
36 An attribute contains a list of ID references <!ATTLIST customer orders IDREFS> <!ATTLIST order orderID ID><customer orders="OR3413 OR3910 OR5310"> ...</customer> ... <order orderID="OR3413"> ... </order> <order orderID="OR3910"> ... </order> <order orderID="OR5310"> ... </order>36
37 Specifying attribute IDs and IDREFs each custID value must be unique in the documenteach orderBy value must reference an ID value somewhere in the document
38 Attribute Defaults There are four possible defaults: #REQUIRED: The attribute must appear with every occurrence of the element.#IMPLIED: The attribute is optional.An optional default value: A validated XML parser will supply the default value if one is not specified.#FIXED: The attribute is optional. If an attribute value is specified, it must match the default value.
39 An attribute contains a list of ID references <!ATTLIST customer custID ID #REQUIRED>a customer ID is required for every customer<!ATTLIST customer custType (school | home | business) #IMPLIED>If an XML parser encounters a customer element without a custType attribute, it assumes a blank value for the attribute<!ATTLIST item itemQty CDATA "1">39Assume a value of "1" for itemQty if it's missing39
41 DTDs and NamespacesYou can work with namespace prefixes, applying a validation rule to the element's qualified name.<!ELEMENT cu:phone (#PCDATA)>Any namespace declarations in a document must also be included in the DTD for the document to be valid. This is usually done using a fixed datatype for the namespace's URI.<!ATTLIST cu:customers xmlns:cu CDATA #FIXED " ">
42 Validating a Document with SMLSpy The Web is an excellent source for validating parsers, including Web sites in which you can upload your XML document for free to have it validated.XMLSpy is an XML development environment created by Altova, which is used for designing and editing professional applications involving XML, XML Schema, and other XML-based technologies.
44 Introducing EntitiesXML supports the following built-in entities: & < > ' "If you have a long text string that will be repeated throughout your XML document, avoid data entry errors by placing the text string in its own entity.You can create your own customized set of entities corresponding to text strings like product descriptions that you want referenced by the XML document.
45 Working with General Entities A general entity is an entity that references content to be used within an XML document. That content can be either parsed or unparsed.A parsed entity references text that can be readily interpreted or parsed by an application reading the XML document.An entity that references content that is either nontextual or which cannot be interpreted by an XML parser is an unparsed entity. One example of an unparsed entity is an entity that references a graphic image file.
46 Working with General Entities The content referenced by an entity can be placed either within the DTD or in an external file. Internal entities reference content found in the DTD. External entities reference content from external files.
47 Internal Parsed Entities <!ENTITY entity “value”>where entity is the name assigned to the entity and value is the entity’s value that must be well-formed XML textExamples:<!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"><!ENTITY MBL25 "<desc>Monarch Butterfly, 6-12 larvae</desc>">& and % are not allowed as part of an entity's value. Use & to include the & symbol, if necessary
48 External Parsed External Entities For longer text strings, place the content in an external file. To create an external parsed entity, use:<!ENTITY entity SYSTEM “uri”>where uri is the URI of the external file containing the entity valueIn the declaration:<!ENTITY MBL25 SYSTEM "description.xml">an entity named “MBL25” gets its value from the description.xml file
49 Referencing a General Entity After an entity is declared, it can be referenced anywhere within the document. The syntax is: &entity;For example, <item>&MLB25;</item> is interpreted as <item>Monarch Butterfly, 6-12 larvae</item>
50 Declare parsed entities in the codes Declare parsed entities in the codes.dtd file for the product codes in the orders.xml documentation<!ENTITY BF100P "Butterfly farm pop-up self erecting portable greenhouse"> <!ENTITY BFGK10 "Field of Dreams backyard butterfly garden kit"> <!ENTITY HME100 "Hummingbird Hawkmoth (Manduca Sexta), 100 eggs"> <!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"> <!ENTITY MP12 "Monarch Pupae (Danaus Plexippus), 12 pupae"> <!ENTITY MWT15 "Giant Milkweed Tree (Calotropis Ssp.), 1 crown flower"> <!ENTITY PLBK70 "Painted Lady classroom breeding kit, 70 larvae">Entity nameEntity value
51 Parameter EntitiesParameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is:<!ENTITY % entity “value”>For external parameter entities, the syntax is:<!ENTITY % entity SYSTEM “uri”>Once a parameter has been declared, you can add a reference to it within the DTD using: %entity
53 Add a parameter entity to the DTD within the orders.xml file to load the contents of the codes.dtd file<!DOCTYPE customers [ .<!ENTITY % itemCodes SYSTEM "codes.dtd">%itemCodes; ]><customers> .<orders><order orderID="or10311" orderBy="cust201"></order>parameter entity pointing to the code in the codes.dtd filereference to the itemCodes parameter entity
54 Inserting general entities reference to the BFGK10 general entity
58 Conditional Sections <![ keyword [ declarations ]]> where INCLUDE is for a section ofdeclarations that you want parsers to interpretand IGNORE for the declarations that youwant parsers to pass over<![IGNORE[ <!ELEMENT Magazine (Name)> <!ATTLIST Magazine Publisher CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> ]]> <![INCLUDE[ <!ELEMENT Book (Title, Author)> <!ATTLIST Book Pages CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> ]]>
59 Conditional Sections using a parameter entity <!ENTITY % UseFullDTD "IGNORE” > <![ %UseFullDTD; [ <!ELEMENT Magazine (Name)> <!ATTLIST Magazine Publisher CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> ]]>By changing the value of the UseFullDTD from IGNORE toINCLUDE, you can add any conditional section that uses thisentity reference to the document's DTD. Thus, you can switchmultiple sections in the DTD off and on by editing a single linein the file. This is most useful when several conditional sectionsare scattered throughout a very long DTD.Conditional sections can be applied only to external DTDs.59
60 Working with Unparsed Entities For a DTD to be able to validate either binary data (images, video) or character data that is not well formed, you need to work with unparsed entities.Because an XML parser cannot work with this type of data directly, a DTD needs to include instructions for how to treat the unparsed entity.To declare an unparsed entity, you must first declare a notation for the data type used in the entity, and then associate a notation with an unparsed entity
61 Declaring a notation <!NOTATION notation SYSTEM "uri"> where notation is the name of the notation and uri is a system location that defines the data type or a program that can work with the data typeFor example, the following notation named “jpeg” that points to an application paint.exe:<!NOTATION jpeg SYSTEM "paint.exe“>You could also use the mime-type value<!NOTATION jpeg SYSTEM "image/jpeg">
62 Associating a notation with an unparsed entity < Associating a notation with an unparsed entity <!ENTITY entity SYSTEM "uri" NDATA notation>where entity is the name of the enity, uri is the system location of a file containing the unparsed data, and notation is the name of the notation that defines the data typeFor example, the following declaration creates an unparsed entity named BF100PIMG that references the graphic image file bf100p.jpg:<!ENTITY BF100PIMG SYSTEM "bf100p.jpg" NDATA jpeg>
63 Adding the image attribute to an XML document Once you created an entity to reference unparsed data, that entity can be associated with attribute values by using the ENTITY data type in the attribute declaration. For example,<!ATTLIST item image ENTITY #REQUIRED>With this declaration added, you could then add the image attribute to an XML document, using the BF100PIMG entity as the attribute's value:<item image="BF100PIMG">
64 Validating Standard Vocabularies To validate a document used with a standard vocabulary, you have to access an external DTD located on a Web server or rely upon a DTD built into your XML parser.For example, to validate an XHTML document against the XHTML 1.0 strict standard, add:<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html> </html>