Presentation is loading. Please wait.

Presentation is loading. Please wait.

DTD, by Dr. Khalil1 Document Type Definitions (DTD) Basic Valid XML Dr. Awad Khalil Computer Science Department AUC.

Similar presentations


Presentation on theme: "DTD, by Dr. Khalil1 Document Type Definitions (DTD) Basic Valid XML Dr. Awad Khalil Computer Science Department AUC."— Presentation transcript:

1 DTD, by Dr. Khalil1 Document Type Definitions (DTD) Basic Valid XML Dr. Awad Khalil Computer Science Department AUC

2 DTD, by Dr. Khalil2 Content  Why DTDs.  How to write simple DTDs, and the benefits of using them.  How to use DTDs.

3 DTD, by Dr. Khalil3 Why Do We Need DTDs?  XML data is guaranteed to use proper XML syntax, and have the properly nested (hierarchical) tree structure that’s common to all XML data.  Well-formed XML data is guaranteed to use proper XML syntax, and have the properly nested (hierarchical) tree structure that’s common to all XML data.  This may be sufficient information for relatively static internal applications, particularly if the XML data is computer-generated and computer-consumed.  The XML structural information and the logic to do this are usually hard-coded separately within the sending and receiving applications, from a common specification.  This can be an efficient, high-performance approach to handling XML data in certain limited circumstances. For example, an internal cooperate application might use well-formed XML as a data transfer mechanism between two different database management systems (DBMSs). Using XML as a transfer syntax would decouple the two DBMSs from each other, so that the transfer could just as easily be directed to a third DBMS without needing to create yet another point-to-point transfer program.  However, when there’s no formal description of the XML data, it’s difficult to describe or modify the structure of that data, since it’s structure and content constraints are buried within the application code. Any changes to the data structure must be made simultaneously in both the sending and receiving applications, and in the separate technical documentation as well.

4 DTD, by Dr. Khalil4 XML Application Requirements  In addition to ensuring that XML data is simply, most XML applications will also need to:  In addition to ensuring that XML data is simply well-formed, most XML applications will also need to:  Describe document structure, preferably in a rigorous and formal manner.  Communicate document structure to other applications and people.  Check that the required elements are present – and prompt the author for their inclusion if they aren’t.  Check that no disallowed elements are included – and prevent the author from using them.  Enforce element content, tree structure, and element attribute values.  Provide default values for unspecified attribute values.  Use standard document formats and data structures.

5 DTD, by Dr. Khalil5 One Solution – The Document Type Definitions  A solution to these requirements is based on separating the XML data description from individual applications which allows all cooperating applications to share a single description of the data. This description of data is known as the.  A solution to these requirements is based on separating the XML data description from individual applications which allows all cooperating applications to share a single description of the data. This description of data is known as the XML vocabulary.  A group of XML documents that share a common XML vocabulary is known as a, and each individual document that conforms to a document type is a.  A group of XML documents that share a common XML vocabulary is known as a document type, and each individual document that conforms to a document type is a document instance.  The XML 1.0 specification provides a standardized means of describing XML document types: the  The XML 1.0 specification provides a standardized means of describing XML document types: the Document Type Definitions (DTD).  are XML documents which can either be incorporated within the XML document containing the data, or exist as a separate document. They define the rules that set out how a document should be structured, what elements should be included, what kind of data may be included and what default values to use.  DTDs are XML documents which can either be incorporated within the XML document containing the data, or exist as a separate document. They define the rules that set out how a document should be structured, what elements should be included, what kind of data may be included and what default values to use.

6 DTD, by Dr. Khalil6 Valid XML  XML documents are those that comply with the basic syntax and structural rules of the XML 1.0 specifications.  Well-formed XML documents are those that comply with the basic syntax and structural rules of the XML 1.0 specifications.  XML documents are well-formed documents that also comply with syntax, structural, and other rules as defined in a DTD.  Valid XML documents are well-formed documents that also comply with syntax, structural, and other rules as defined in a DTD.  Multiple documents and applications can share DTDs. Having a central description of the XML data and a standardized validation method lets us move both data description and validation code out of numerous individual applications.  The data description code becomes the DTD, and the validation code is already present (and optimized) in the validating XML parser. This greatly simplifies our application code, and thus improves both performance and reliability.  Valid XML is also preferable to simple well-formed XML for most document- oriented data.Being able to define such rules will become more important as we exchange, process, and display XML in a wider environment, such as in a B2B or e-commerce scenario.  Using DTDs will allow us not only to determine that XML documents follow the syntax rules of the XML specification, but also that they follow our own rules regarding content and structure.

7 DTD, by Dr. Khalil7 Validating Parsers  To ensure that an XML document is not just well-formed, but valid, as well:   fxp http://www.informatik.uni-tier.de/~neumann/fxp   Java Project X Tr2 http://developer.java.sun.com/developer/products /xml http://developer.java.sun.com/developer/products /xml   MSXML “preview” http://msdn.microsoft.com/downloads/webtechno logy/xml/msxml.asp http://msdn.microsoft.com/downloads/webtechno logy/xml/msxml.asp   rxp http://www.cogsci.ed.ac.uk/~richard/rxp.html  STG Validator http://www.stg.brown.edu/service/xmlvalid  XJParser http://xdev.datachannel.com/downloads/xjparser  XML4C / XML4J http://www.alphaworks.ibm.com/formula/xml  XML for Java v2 http://technet.oracle.com/tech/xml/parser_java2  Xerces-C++ http://xml.apache.org/xerces-c/index.html  Xerces-J http://xml.apache.org/xerces-j/index.html

8 DTD, by Dr. Khalil8 Sharing DTDs  Shared DTDs are a very powerful aspect of XML  Shared DTDs are the basis for many XML vocabularies. Using a shared data description greatly simplifies the process of creating and maintaining an XML vocabulary. It can also make any application code simpler, and thus more reliable and easier to maintain.  With a shared DTD, there’s only one place where we need to make modifications to the vocabulary’s data description, instead of three (the specification, sending application, and receiving application).  Having standardized XML vocabularies for common things (such as bibliographic information, for example) allows developers to reuse existing DTDs, saving the cost of developing custom DTDs.  Custom DTDs isolate their users and applications from others that might otherwise be able to share commonly formatted documents and data.  Shared DTDs are the foundation of XML data interchange and reuse.

9 DTD, by Dr. Khalil9 An Example – A Book Catalog  The BookCatalog DTD is an example of a shared XML vocabulary that can be used by a publisher to communicate with its distributors, retailers, and other interested parties.  An XML-aware browser will allow users to learn about future publications, read reviews of existing books, or even order books directly from the catalog. Why XML?  A single common standard syntax  Easily shared vocabularies  Standard methods and tools for transforming data  Utilization of existing Internet protocols such as HTTP Why DTDs?  Provide a formal and complete definition of an XML vocabulary  Are sharable descriptions of the structure of an XML document  Are a way to validate specific instances of XML documents and constrain their content  Are restricted to one DTD per document instance Other Alternatives to DTDs  XML-data  XML-Data-Reduced  XML Schemas – are well on their way to becoming a formal W3C recommendation

10 DTD, by Dr. Khalil10 The Basic BookCatalog Data Model  Before we can create a DTD, we first need to develop a data model that describes the BookCatalog vocabulary and its grammer:  – A document header describing the rest of the document  Catalog – A document header describing the rest of the document  – Vendor of Books, employer of Authors  Publisher – Vendor of Books, employer of Authors  – Creator of Books, employee of Publisher  Author – Creator of Books, employee of Publisher  – Creation of Author and Publisher  Book – Creation of Author and Publisher

11 DTD, by Dr. Khalil11 Basic BookCatalog Document (BookCatalogBasic.xml) <BookCatalog> The Wrox BookCatalog ('Basic' version) The Wrox BookCatalog ('Basic' version) 2000-05-23 2000-05-23 Wrox Press Ltd. Wrox Press Ltd. www.wrox.com www.wrox.com Arden House Arden House 1102 Warwick Road 1102 Warwick Road Acocks Green Acocks Green Birmingham Birmingham England England B27 6BH B27 6BH UK UK Wrox Press Inc. Wrox Press Inc. www.wrox.com www.wrox.com 29 S LaSalle St, Suite 520 29 S LaSalle St, Suite 520 Chicago Chicago IL IL USA USA 60603 60603 1-861003-11-0 Professional XML Didier Martin Mark Birbeck Michael Kay Brian Loesgen Jon Pinnock Steven Livingstone Peter Stark Kevin Williams Richard Anderson Stephen Mohr David Baliles Bruce Peat Nikola Ozu Jon Duckett Peter Jones Karli Watson The complete practical encyclopedia for XML today. 1169 $49.99 Internet Internet Programming XML

12 DTD, by Dr. Khalil12 Internal versus External DTDs  Only one DTD can be associated with a given XML document.  The DTD may be divided into two parts:  The – which is the part of DTD included within the document  The internal subset – which is the part of DTD included within the document  The – which is the set of declarations that are located in a separate document (might be a database record or a file).  The external subset – which is the set of declarations that are located in a separate document (might be a database record or a.dtd file). There’s no requirement that a DTD uses a particular subset.  A DTD might be contained entirely within the document (an internal subset), with no external subset, or a document may simply refer to an external subset and contain no DTD declarations of its own.  So, there are three possible forms of DTD:  An internal subset DTD  An external subset DTD  A combined DTD, using both the internal and external subsets  DTD declarations in the internal subset have priority over those in the external subset.

13 DTD, by Dr. Khalil13 Associating a DTD with an XML document (DOCTYPE)  Each XML document can be associated with one, and only one, DTD using a single declaration, via the following basic structure:  Each XML document can be associated with one, and only one, DTD using a single DOCTYPE declaration, via the following basic structure:  is the name of the document element. This name is required and connects the DTD to the entire element tree:  Document_element – is the name of the document element. This name is required and connects the DTD to the entire element tree: … …  DTDs are always associated with the document element.  source – used to associate an external subset with a document, ( SYSTEM or PUBLIC ). In the following example, the parser will attempt to retrieve the DTD from the specified URL AT Wrox’s website: http://www.wrox.com/DTDs/BookCatalog.dtd A document residing on the same web server as the DTD could also access the DTD directly: file://DTDs/BookCatalog.dtd Using the PUBLIC keyword allows a non-specific reference to the DTD via a URI: <!DOCTYPE BookCatalog PUBLIC “PublishingConsorium/BookCatalog” > <!DOCTYPE BookCatalog PUBLIC “PublishingConsorium/BookCatalog” “http://www.wrox.com/DTDs/BookCatalog.dtd>http://www.wrox.com/DTDs/BookCatalog.dtd

14 DTD, by Dr. Khalil14 An Example – DOCTYPE Declaration (Linking a DTD to a Document) 1. Let’s create a simple DTD and demonstrate linking it to two different document instances. Let’s stipulate that this DTD requires a conforming document to have exactly three empty elements, named X, Y, Z, with the elements appearing in XYZ order. We create the following BookCatalogTrivial.dtd: 2. Now, we create two example documents:  BookCatalogTrivial.xml:  BookCatalogAbsentDTD.xml:

15 DTD, by Dr. Khalil15 Basic DTD Markup  DTD declarations are delimited with the XML tag delimiters ”).  All DTD declarations are indicated by the use of the exclamation mark (“!”) followed by a keyword, and its specific parameters:  There are four basic keywords used in DTD declarations:  – Declares an XML element type name ant its permissible sub- elements (“children”).  ELEMENT – Declares an XML element type name ant its permissible sub- elements (“children”).  – Declares XML element attribute names, plus permissible and/or default attribute values.  ATTLIST – Declares XML element attribute names, plus permissible and/or default attribute values.  – Declares special character references, text macros (much like a C/C++ #define statement), and other repetitive content from external sources (like a C/C++ #include).  ENTITY – Declares special character references, text macros (much like a C/C++ #define statement), and other repetitive content from external sources (like a C/C++ #include).  – Declares external non-XML content (for example, binary image data) and the external application that handles that content.  NOTATION – Declares external non-XML content (for example, binary image data) and the external application that handles that content.

16 DTD, by Dr. Khalil16 Element Type (ELEMENT) Declarations  Elements are described using the, which can have one of two different forms:  Elements are described using the element type declaration, which can have one of two different forms:  The (content category) and (content model) parameters describe what kind of content (if any) may appear within elements of the given name.  The category (content category) and content_model (content model) parameters describe what kind of content (if any) may appear within elements of the given name.  There are five categories of element content:  – Elements type may contain any well-formed XML data.  ANY – Elements type may contain any well-formed XML data.  – Element type may not contain any text or child elements – only element attributes are permitted.  EMPTY – Element type may not contain any text or child elements – only element attributes are permitted.  – Element type contains only child elements – no additional text is permitted within the element type.  element – Element type contains only child elements – no additional text is permitted within the element type.  – Element type may contain text and/or child elements.  mixed – Element type may contain text and/or child elements.  – Element type may contain text (character data) only.  PCDATA – Element type may contain text (character data) only.  Examples: ------------------------------------------------------ --------------------

17 DTD, by Dr. Khalil17 Content Models  Content models are used to describe the structure and content of a given element type. This content may be:  Character data (PCDATA)  One or more child element types (element-only content)  A combination of the two (mixed content)  The key difference between element content and mixed content is the use of the keyword. If present, the content model is either mixed or PCDATA. The absence of this keyword indicates element-only content.  The key difference between element content and mixed content is the use of the #PCDATA keyword. If present, the content model is either mixed or PCDATA. The absence of this keyword indicates element-only content.  Examples: Character data can include entity references (e.g. & or <). ------------------------------------------------------- Some element content.....some other childish content.....yet another child’s content..  Child elements may be constrained to appear in a specific sequence (sequence list). -------------------------------------------------------  In a mixed content model, child elements are constrained to character data plus a simple list of valid child element types, without any sequence or choice specifications.

18 DTD, by Dr. Khalil18 Content Models (Cont’d)  Consider the following DTD:  A document instance that conforms to this declaration: Mr John Q Public  We could use choice of specific empty elements to replace the two text- containing elements:  The conforming document instance would now becomeext-containing elements: John Q Public

19 DTD, by Dr. Khalil19 Content Models (Cont’d)  – define how many child elements may appear in a content model:  Cardinality operators – define how many child elements may appear in a content model:  – The absence of a cardinality operator character indicates that one, and only one, instance of the child element is allowed (required).  (none) – The absence of a cardinality operator character indicates that one, and only one, instance of the child element is allowed (required).  – Zero or one child element (optional singular element).  ? – Zero or one child element (optional singular element).  - Zero or more child elements (optional elements).  * - Zero or more child elements (optional elements).  - One or more child elements (required elements)  + - One or more child elements (required elements)  Consider the following DTDs:  Here are some conforming instances: John Q P Public Jane Doe Madonna

20 DTD, by Dr. Khalil20 Content Models (Cont’d)  Consider the following DTDs:  Here are some non-conforming (“not valid”) instances: Smith Bob Jane Doe Madonna Cicone

21 DTD, by Dr. Khalil21 An Example – ELEMENT Declaration  Consider the following BookCatalogBasic.dtd: section --> section --> and section -->

22 DTD, by Dr. Khalil22 An Example – ELEMENT Declaration ( BookCatalogBasic.xml ) <BookCatalog> The Wrox BookCatalog ('Basic' version) The Wrox BookCatalog ('Basic' version) 2000-05-23 2000-05-23 Wrox Press Ltd. Wrox Press Ltd. www.wrox.com www.wrox.com Arden House Arden House 1102 Warwick Road 1102 Warwick Road Acocks Green Acocks Green Birmingham Birmingham England England B27 6BH B27 6BH UK UK Wrox Press Inc. Wrox Press Inc. www.wrox.com www.wrox.com 29 S LaSalle St, Suite 520 29 S LaSalle St, Suite 520 Chicago Chicago IL IL USA USA 60603 60603 1-861003-11-0 Professional XML Didier Martin Mark Birbeck Michael Kay Brian Loesgen Jon Pinnock Steven Livingstone Peter Stark Kevin Williams Richard Anderson Stephen Mohr David Baliles Bruce Peat Nikola Ozu Jon Duckett Peter Jones Karli Watson The complete practical encyclopedia for XML today. 1169 $49.99 Internet Internet Programming XML

23 DTD, by Dr. Khalil23 Attribute (ATTLIST) Declarations  Element attributes are described using the attribute-list declaration (), with the standard syntax:  Element attributes are described using the attribute-list declaration ( ATTLIST ), with the standard syntax:  Example: <!ATTLIST Book isbn CDATA #REQUIRED title CDATA #REQUIRED author CDATA #REQUIRED pages CDATA #IMPLIED price CDATA #IMPLIED > Attribute Types  CDATA – Character data (simple data string).  Enumerated value(s) – Attribute value must be one of a series that is explicitly defined in the DTD.  ID – Attribute value is the unique identifier for this element instance.  IDREF – A reference to the element with an ID attribute that has the same value as that of the IDREF.  IDREFS – A list of IDREFs delimited by white space.  NMTOKEN – A name token (a text string that conforms to the XML name rules, except that the first character of the name may be any valid name character).  ENTITY – The name of a pre-defined entity.  ENTITIES – A list of ENTITY names delimited by white space.  NOTATION – Attribute value must be a notation type that is explicitly declared elsewhere in the DTD.

24 DTD, by Dr. Khalil24 ATTLIST - Examples  A DTD:  A conforming instance: … -----------------------------------------  A DTD: <!ATTLIST Person perID ID #REQUIRED > A conforming instance: John Q Public Acme XML Writers jqpublic@notmail.com John, Jr. is a swell fellow, son of John, Sr.  A DTD: <!ATTLIST PersonName honorific (Mr | Ms | Dr | Rev) #IMPLIED suffix (Jr | Sr | III) #IMPLIED >  A conforming instance: John Q Public

25 DTD, by Dr. Khalil25 ATTLIST (Cont’d)  The value of an ID attribute must be a legal XML name, unique within a document, and use the #IMPLIED or #REQUIRED default values. There may only be on ID attribute for each element type.  The value of an IDREF attribute must be a legal XML name, and must match the value of an ID attribute within the same document. Multiple IDREFs to the same ID are permitted.  Examples: ------------------------------------------------------------ ------------------------------------------------------------

26 DTD, by Dr. Khalil26 Attribute Defaults  – The attribute must appear in every instance of the element.  #REQUIRED – The attribute must appear in every instance of the element.  – The attribute is optional.  #IMPLIED – The attribute is optional.  (plus default value) – The attribute may or may not appear in the document. If the attribute does appear, it must must match the default value; if it doesn’t appear, the parser may supply the default value.  #FIXED (plus default value) – The attribute may or may not appear in the document. If the attribute does appear, it must must match the default value; if it doesn’t appear, the parser may supply the default value.  – The attribute may or may not appear in the document. If the attribute does appear, it may be any value that matches those in the ATTLIST declaration; if it doesn’t appear, the parser may supply the default value.  #default value(s) – The attribute may or may not appear in the document. If the attribute does appear, it may be any value that matches those in the ATTLIST declaration; if it doesn’t appear, the parser may supply the default value.

27 DTD, by Dr. Khalil27 An Example – BookCatalogExpanded.dtd section --> section --> section --> <!ATTLIST Publisher pubID ID #REQUIRED isbn CDATA "??????" >

28 DTD, by Dr. Khalil28 BookCatalogExpanded.dtd (Cont’d) section --> section --> <!ATTLIST Person perID ID #REQUIRED perID ID #REQUIRED role (AU | ED | AE | IL | RV | unknown) #REQUIRED > role (AU | ED | AE | IL | RV | unknown) #REQUIRED > <!ATTLIST PersonName honorific (Mr. | Ms. | Dr. | Rev.) #IMPLIED honorific (Mr. | Ms. | Dr. | Rev.) #IMPLIED suffix (Jr. | Sr. | I | II | III | IV | V | VI | VII | VIII) #IMPLIED suffix (Jr. | Sr. | I | II | III | IV | V | VI | VII | VIII) #IMPLIED> section --> section --> section --> <!ATTLIST Book bookID ID #REQUIRED isbn CDATA #REQUIRED publisher IDREF #REQUIRED authors IDREFS #REQUIRED editors IDREFS #REQUIRED imprint IDREF #IMPLIED pubDate CDATA "2000" pages CDATA "????" level IDREF #IMPLIED >

29 DTD, by Dr. Khalil29 BookCatalogExpanded.xml <BookCatalog><Catalog> The Wrox BookCatalog ('Expanded' version) The Wrox BookCatalog ('Expanded' version) 2000-05-23 2000-05-23 </Catalog><Publishers> Wrox Press Ltd. Wrox Press Ltd. feedback@wrox.com feedback@wrox.com www.wrox.com www.wrox.com Arden House Arden House 1102 Warwick Road 1102 Warwick Road Acocks Green Acocks Green Birmingham Birmingham England England B27 6BH B27 6BH UK UK Programmer to Programmer Beginning Instant Professional Wrox Press Inc. 29 S LaSalle St, Suite 520 Chicago IL 60603 USA

30 DTD, by Dr. Khalil30 BookCatalogExpanded.xml (Cont’d)<Persons> Didier Didier P P H H Martin Martin Talva Corp. Talva Corp. Didier PH Martin has worked with computers for 21 Didier PH Martin has worked with computers for 21 years. years. Dianne Dianne Parker Parker Wrox Press Ltd. Wrox Press Ltd. David David Hunter Hunter Jon Duckett Wrox Press Ltd. Jonathan Pinnock jon@jpassoc.co.uk Hertfordshire England UK Karli Watson Wrox Press Ltd.

31 DTD, by Dr. Khalil31 BookCatalogExpanded.xml (Cont’d) Lisa Lisa Stephenson Stephenson Wrox Press Ltd. Wrox Press Ltd. Mark Mark Birbeck Birbeck Mark has been a professional programmer for 18 Mark has been a professional programmer for 18 years. years. Nikola Nikola Ozu Ozu Nikola Ozu is a consultant who lives in Wyoming. Nikola Ozu is a consultant who lives in Wyoming. Peter Jones Wrox Press Ltd. Stephen Mohr Omicron Consulting Steven Livingstone

32 DTD, by Dr. Khalil32 BookCatalogExpanded.xml (Cont’d) <Subjects> Internet Internet Internet Programming Internet Programming XML (eXtensible Markup Language) XML (eXtensible Markup Language) </Subjects><Books> <Book bookID="PRFSNL_XML" isbn="1-861003-11-0" publisher="WRX_PRS_LTD" <Book bookID="PRFSNL_XML" isbn="1-861003-11-0" publisher="WRX_PRS_LTD" authors="DDR_MRTN JNTHN_PNCK MRK_BRBCK STPHN_MHR NKL_OZ" authors="DDR_MRTN JNTHN_PNCK MRK_BRBCK STPHN_MHR NKL_OZ" editors="JN_DCKT PTR_JNS KRL_WTSN" editors="JN_DCKT PTR_JNS KRL_WTSN" imprint="PRG2PRG" pubDate="2000-01" pages="1169" level="PRO"> imprint="PRG2PRG" pubDate="2000-01" pages="1169" level="PRO"> Professional XML Professional XML The complete practical encyclopedia for XML today. The complete practical encyclopedia for XML today. 49.99 49.99 35.99 35.99 74.95 74.95 <Book bookID="BEGNNG_XML" isbn="1-861003-41-2" publisher="WRX_PRS_LTD" authors="DVD_HNTR JNTHN_PNCK NKL_OZ" editors="DN_PRKR LS_STPHNSN" imprint="PRG2PRG" level="BEG"> Beginning XML The best practical tutorial for XML. 39.99 28.99 59.95

33 DTD, by Dr. Khalil33 Limitations of DTDs  DTDs are not extensible (unlike XML itself).  Only one DTD may be associated with each document.  DTDs do not work well with XML namespaces.  Very weak data typing.

34 DTD, by Dr. Khalil34 Thank you


Download ppt "DTD, by Dr. Khalil1 Document Type Definitions (DTD) Basic Valid XML Dr. Awad Khalil Computer Science Department AUC."

Similar presentations


Ads by Google