Presentation on theme: "XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)"— Presentation transcript:
XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)
XML DTD Transparency No. 2 Objectives The purpose of using schemas The schema languages DTD XML Schema RELAX NG Regular expressions a commonly used formalism in schema language for defining schema.
XML DTD Transparency No. 3 XML Languages and schemas XML language: is a set of XML documents used in a domain and use a common set of elements and attributes. E.g. XHTML, MathML, SVG, CML, RecipeML schema: a formal definition of the syntax of an XML language define the collection of elements that could be used in the language together with all possible attributes and contents of each element. schema language: a notation (or langauge) for writing schemas
XML DTD Transparency No. 5 General Requirements for designing a schema language Expressiveness Efficiency Comprehensibility
XML DTD Transparency No. 6 Regular Expressions Commonly used in schema languages to describe sequences of characters or elements : an alphabet (typically Unicode characters or element names) matches the string ? matches zero or one * matches zero or more ’s + matches one or more ’s matches any concatenation of an and a | matches the union of and
XML DTD Transparency No. 7 Examples A regular expression describing integers: 0 | -?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)* A regular expression describing the valid contents of table elements in XHTML:
XML DTD Transparency No. 8 DTD - Table of Contents Introduction to DTD An introduction to the XML Document Type Definition. DTD - XML Building Blocks What XML building blocks are defined in a DTD. DTD Elements How to define the elements of an XML document using DTD. DTD Attributes How to define the legal attributes of XML elements using DTD. DTD Entities How to define XML entities using DTD.
XML DTD Transparency No. 9 DTD – Document Type Definition XML DTD is a subset of the DTD formalism from SGML a part of XML 1.0 A starting point for development of more expressive schema languages Considers elements, attributes, and character data only processing instructions and comments are mostly ignored because they are semantically not part of a document
XML DTD Transparency No. 10 Checking Validity with DTD A DTD processor (also called a validating XML parser) parses the input document (includes checking well- formedness) checks the root element name for each element, checks its contents and attributes checks uniqueness and referential constraints ( ID / IDREF ( S ) attributes)
XML DTD Transparency No. 11 Internal subset (of a DTD) This is an XML document with a Document Type Definition: Tove Jani Reminder Don't forget me this weekend! The DTD is interpreted like this: !ELEMENT note (in line 3) defines the element "note" as having four elements: "to,from,heading,body". and so on.....
XML DTD Transparency No. 12 External subset (of a DTD) This is the same XML document with an external DTD: Tove Jani Reminder Don't forget me this weekend!
XML DTD Transparency No. 13 note.dtd This is a copy of the file "note.dtd" containing the Document Type Definition:
XML DTD Transparency No. 14 Why use a DTD? A means for people to use a common format for interchanging data. provides an application independent way of sharing data. can use a DTD to verify that the XML document we produced or received from the outside world is valid.
XML DTD Transparency No. 15 2.8 Document Type Declaration (cont’d) Document Type Definition  doctypedecl ::= '' [28a] DeclSep ::= PEReference | S [28b] intSubset ::= ( markupdecl | DeclSep )*  markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment Notes: 1.DTD = internal subset + External Subset 2.internal subset defined by intSubset; external subset defined by an external entity specified by ExternalID
XML DTD Transparency No. 17 Document Type Declarations Associates a DTD schema with the instance document 1. Contains both internal and external subsets
"name": "XML DTD Transparency No.",
"description": "17 Document Type Declarations Associates a DTD schema with the instance document 1. Contains both internal and external subsets ... 2. External Subset only... 3. Internal subset only... system identifier (a URI) public identifier.",
XML DTD Transparency No. 18 2.8 Example XML documents Hello, world! The system identifier "hello.dtd" gives the URI of a DTD for the document. The declarations can also be given locally, as in this example: Hello, world!
XML DTD Transparency No. 19 XML building blocks (content part) (The content parts of) XML documents are made up of the following building blocks: Elements, Tags -- Start Tag, End Tag -- Attributes, PCDATA, CDATA Section Processing Instruction, Comment Entities, Discussed in the previous lecture.
XML DTD Transparency No. 20 Entities (reviewed) Entities are used to define common texts like macros in PLs. Entity references are references to entities. format: if xxx is an entity name, then use &xxx; as its entity reference. e.g.; is used to insert an extra space in an HTML document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML: Entity References Character << >> && "" ' ' More about entity later.
XML DTD Transparency No. 21 DTD – Element Declaration Declaring an Element which may occur in the document Format: Types of element contents: EMPTY – no contents ANY -- no restriction on contents MIXED-- allow character data (character data only) or (character data + elements) ElementOnly -- allow elements only
XML DTD Transparency No. 22 EMPTY element content Declare an element with empty content format: Example: Valid Instances:
XML DTD Transparency No. 23 ANY Element content Declare an element that can contain any combination of elements and text data. Declared with the ‘ANY’ keyword : Example: Valid instances (with respective to E1 only): begin middle fff end dddd
XML DTD Transparency No. 24 Elements with MIXED contents Two cases: 1. Elements that can only contain text contents 2. Elements allowing text as well as element contents Example: (X) 1. no star 2. #PCDATA placed in wrong position Valid Instances: ddd cd ttt #PCDATA must appear first!.
XML DTD Transparency No. 25 Elements that can contain element contents only Issue: how to declare the possible sequences of content elements. Solution: regular expressions over element (names) Definition: 1. CP ::= (name | choice | seq ) (‘+’ | ‘*’ | ‘?’ )? 2. choice ::= a list of two or more CPs separated by ‘|’ and is enclosed by ‘(‘ and ‘)’. 3. seq::= a list of one or more CPs seprated by ‘,’ and is enclosed by ‘(‘ and ‘)’ ElementOnly elements:
XML DTD Transparency No. 26 Recursive definition of CP, seq and choice: Basis: if is a name, then , ?, +, * are CPs (content particle). --- basic CP Closure: if is a seq or choice, then , ?, +, * are CPs. if 1, 2,… n (n > 1) are CPs, then ( 1 | 2 | … | n ) is a choice. if 1, 2,… n (n > 0) are CPs, then ( 1, 2, …, n ) is a seq. is a children if is a non-basic CP (i.e., a CP but is not a basic CP). Examples of children: Illegal :, Legal :,,
XML DTD Transparency No. 27 More examples (X) (0) (x, 1-ambiguous) Rewritten as … (E1, (E2 | (E3,E2)))> (0)
XML DTD Transparency No. 28 3.2 Grammar of Element Type Declaration  elementdecl ::= ' ' [ VC: Unique Element Type Declaration]VC: Unique Element Type Declaration  contentspec ::= ‘EMPTY’ | ‘ANY’ | Mixed | children Examples:
XML DTD Transparency No. 29 Official Grammar of ElementOnly content Models  children ::= (choice | seq) ('?' | '*' | '+')?  cp ::= (Name | choice | seq) ('?' | '*' | '+')?  choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'  seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Examples: (x) Note: (x) (0) [49,50] has an additional VC: Proper Group/PE Nesting
XML DTD Transparency No. 31 Attribute Definition ELEMENT declarations prescribe each element type that can appear in a document and define the permissible content of each element type. Ex: To prescribe all attributes that can appear in the start tag of an element type, we use ATTLIST declaration.
XML DTD Transparency No. 32 ATTLIST declaration To define permissible attributes associated with book element, we use: 1. 2. 1. and 2. can be merged as : 3. Format: Note: Attributes have a name, a type, a default-value and belong to an element.
XML DTD Transparency No. 33 Attribute types TypeMeaning CDATAThe value is character data. (v 1 | v 2 | …|v k )The value must be one of the listed name tokens: v 1 …v k. IDThe value is an unique id. IDREFThe value is a reference to an id. IDREFSThe value is a list of IDREFs. NMTOKENThe value is a valid XML name token. NMTOKENSThe value is a list of name tokens. ENTITYThe value is an (unparsed) entity. ENTITIESThe value is a list of (unparsed)entities. NOTATION (v 1 | v 2 | …|v k ) The value must be one of the listed notation names :v 1 …v k.
XML DTD Transparency No. 34 Attribute-default value default-valueMeaning “v1” The attribute has a default value v1 its value can be overridden in the doc. #REQUIRED The attribute must be given explicitly in the document. #IMPLIED The attribute does not have to appear in the document. #FIXED “v1” The attribute value is fixed to v1 and could not be overridden in the doc. If specified in doc, its value must be ‘v1’
XML DTD Transparency No. 35 Attributes with default value EX1: XML elements: Ex2: Below are equivalent XML Elements: …
XML DTD Transparency No. 36 #IMPLIED attribute Syntax: Ex: instance: 1. 2. Both 1 and 2 are valid but they are not equivalent.
XML DTD Transparency No. 38 #FIXED “value” attributes Syntax: Ex: Instances: 1. … 2. … 3. … (x) Notes: 1. and 2. are equivalent. 3. is invalid.
XML DTD Transparency No. 39 Official Grammar of Attribute-List Declarations Attribute-list Declaration  AttlistDecl ::= ' '  AttDef ::= S Name S AttType S DefaultDecl XML attribute types are classified into three kinds: string type (CDATA), enumerated types – name tokens or notations tokenized types (ID, IDREF,IDREFS, NMTOKEN…).
XML DTD Transparency No. 40 3.3.1 Attribute Types  AttType ::= StringType | TokenizedType | EnumeratedType  StringType ::= 'CDATA'  TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS’ | 'ENTITY’ | 'ENTITIES' | 'NMTOKEN’ | 'NMTOKENS’ Notes: ID, IDREF and IDREFS used for cross references ENTITY(S) for referring to external unparsed objects NMTOKEN(S) restrict attribute value to be Nmtoken(s).
XML DTD Transparency No. 41 ID and IDREF(S) If an attribute is of ID type, the value of every occurrence of this attribute must be unique among all ID attribute values of the whole document. Ex: Instances: name=“p1” and sid=“p1” violate ID constraint.
XML DTD Transparency No. 42 Notation A notation in XML is a name used to identify a specific type of (non-xml) data like ppt, pdf, word, jpeg, gif, etc. Each notation must be declared and is associated with a system identifier and/or public identifier. We may limit the value of an attribute to be a notation name from a list of declared notation names
XML DTD Transparency No. 43 3.3.1 Enumerated Attribute Types Enumerated Attribute Types  EnumeratedType ::= NotationType | Enumeration  NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')'  Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')’  is used to limit the attribute value to be one of the listed notation names.
XML DTD Transparency No. 47 White Space and End-of-line Handling White Space: special attribute xml:space used to indicate if (markup) spaces should be preserved. Every XML document must be normalized for end- of-line before parsing: in order to eliminate difference from different OSs #xD#xA --> #xA // \r\n or \r replaced by \n #D --> #xA // this is done before parsing
XML DTD Transparency No. 48 2.12 Language Identification the preserved attribute xml:lang may be inserted in documents to specify the language used inside an element. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages”. Example:
XML DTD Transparency No. 49 2.12 Language Identifications The quick brown fox jumps over the lazy dog. What colour is it? What color is it? 君不見黃河之水天上來, 奔流到海不復回。 君不見高堂明鏡悲白髮, 朝如青絲暮成雪。 人生得意須盡歡，莫使金樽空對月。 天生我材必有用，千金散盡還復來。 烹羊宰牛且為樂，會須一飲三百杯。岑夫子，丹丘生，將進酒，杯莫停。 與君歌一曲，請君為我傾耳聽。鐘鼓饌玉何足貴，但願長醉不願醒。 古來聖賢皆寂寞，唯有飲者留其名。 陳王昔時宴平樂，斗酒十千恣歡謔。主人何為言少錢，徑須沽取對君酌。 五花馬，千金裘，呼兒將出換美酒，與爾同銷萬古愁。
XML DTD Transparency No. 51 DTD-Entities Entities used to define shortcuts to common text, like macros in programming languages. Entity references are references to entities. If name is an entity [name], then &name; (or %name; but not both) is its reference Entities can be declared internal ( contents in the same doc as its declaration) or external (contents external to its declaration) Two more classifications later.
XML DTD Transparency No. 52 Internal Entity Declaration Syntax: DTD Example: XML example: &p1; &birthday; Equivalent to : Peter 2/12/2000
XML DTD Transparency No. 54 Some large DTD Examples XHTML 1.0 XHTML 1.0 DTD DocBook 5.0 DTD docbook.dtd SVG 1.1 svg 1.1 dtd
XML DTD Transparency No. 55 Structure of XML Documents Logical Structure Elements Character data Physical Structure Entities Document Unit Sub-unit Document entity External parsed entity External unparsed entity
XML DTD Transparency No. 56 4. Physical Structures An XML document may consist of one or many storage units called entities; have content identified by name. may have an associated URI Each XML document has one entity called the document entity, the starting entity for the XML processor and may contain the whole document. the only kind of entities without a name. Entities may be either parsed or unparsed. unparsed --> not to be analyzed to XML processors. used for non-xml data (e.g. image file).
XML DTD Transparency No. 57 Properties of an entity entity name: Every entity but the document entity has a name entity reference: if xxx is the name of an entity, then &xxx; (or %xxx;) is its entity reference content: replacement text: the text to be substituted for all occurrences of its reference entity value : the literal value appearing in an entity declaration. Internal or external: external content from external files internal content from part of its declaration
XML DTD Transparency No. 58 general or parameter general to be referenced and expanded in document region parameter to be expanded in DTD region and hence references can appear in DTD region only parsed or unparsed parsed part of an xml documents unparsed non-xml data or xml-data but not intended to processed by xml parser. unparsed entities are always external and general. Note: Since unparsed entities must be general and external, there are only 5 kinds of entities.
XML DTD Transparency No. 59 Parsed entity and unparsed entity An unparsed entity is a resource whose contents are not to be processed by XML processor. has an associated notation, identified by name. must be an external entity (with publicId and/or SystemId) referenced by [entity] name (instead of entity reference) occurring only in the value of ENTITY or ENTITIES attributes. Parsed entities are entities whose contents need to be processed by XML Processor. referenced by using entity references. contents are referred to as its replacement text;
XML DTD Transparency No. 60 Examples external general parsed entity. internal general parsed entity internal parameter parsed entity external general unparsed entity. Note: Notation and unparsed entity are rarely used in practice.
XML DTD Transparency No. 61 Example of unparsed entity usage … … ]>... … … 。 Type of cover1
XML DTD Transparency No. 62 General entity and parameter entity Parameter entities are parsed entities for use in grammar (DTD ). referenced by the form: %name; General entities are entities for use in the document content. sometimes simply called entity. referenced by the form: &name; Comparisons: use different syntax in DTD for definition. use different forms of references recognized in different contexts (grammar v.s. data).
XML DTD Transparency No. 63 Examples external general parsed entity. internal general parsed entity internal parameter parsed entity external parameter parsed entity. Notes: All parameter entities are parsed entities Parameter entities carry grammar information. General entities carry data contents.
XML DTD Transparency No. 64 4.1 Character and Entity References A character reference refers to a specific character in the ISO/IEC 10646 character set. Character Reference  CharRef ::= '' [0-9]+ ';' | '' [0-9a-fA-F]+ ';'
XML DTD Transparency No. 65 4.1 Character and Entity References (cont’d) Entity Reference  Reference ::= EntityRef | CharRef  EntityRef ::= '&' Name ';'  PEReference ::= '%' Name ';’
XML DTD Transparency No. 66 4.2 Entity Declarations Entity Declaration  EntityDecl ::= GEDecl | PEDecl  GEDecl ::= ' '  PEDecl ::= ' '  EntityDef ::= EntityValue  | ( ExternalID NDataDecl?)  PEDef ::= EntityValue | ExternalID notes: 1. General entities can only be referenced at non-DTD region 2. Parameter entities are referenced at DTD internal entityunparsed entity external entity
XML DTD Transparency No. 68 4.2.1 Internal Entities Entities defined by EntityValue is called an internal entity. the content of the entity is given in the declaration. no separate physical storage object, Some processing of entity and character references in the literal entity value may be required to produce the correct replacement text. An internal entity is always a parsed entity. Example of an internal entity declaration:
XML DTD Transparency No. 69 4.2.2 External Entities If the entity is not internal, it is an external entity. External Entity Declaration  ExternalID ::= 'SYSTEM' S SystemLiteral  | 'PUBLIC' S PubidLiteral S SystemLiteral  NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ] If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity. [VC: Notation Declared]: The Name must match the declared name of a notation. SystemLiteral is called the entity’ system identifier, which is a URI. PubidLiteral is called the entity’s public identifier, which the XML processor may use to produce an alternative URI.
XML DTD Transparency No. 70 Examples of external entity declaration
XML DTD Transparency No. 71 4.3 Parsed Entities 4.3.1 The Text Declaration External parsed entities may each begin with a text declaration. Text Declaration  TextDecl ::= ' ' Notes: must be placed at the beginning of an external parsed entity if appearing.
XML DTD Transparency No. 72 4.3.2 Well-formed Parsed Entities The document entity is well-formed if it matches the production labeled document . An external general parsed entity is well-formed if it matches the production labeled extParsedEnt . All external parameter entities are well-formed by definition. Well-Formed External Parsed Entity  extParsedEnt ::= TextDecl? content
XML DTD Transparency No. 73 4.3.2 Well-Formed Parsed Entities (cont’d) An internal general parsed entity is well-formed if its replacement text matches the production labeled content . content All internal parameter entities are well-formed by definition. A consequence of well-formedness in entities: the logical and physical structures in an XML document are properly nested; i.e., no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another. --- 有始有終
XML DTD Transparency No. 75 4.3.3 Character Encoding in Entities External parsed entities may use different encoding for their characters. All XML processors must support UTF-8 and UTF-16. must declare encoding in text declaration for encoding other than UTF-8 or UTF-16. Encoding Declaration  EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'"EncName "'" )  EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */ Examples:
XML DTD Transparency No. 76 4.4 XML Processor Treatment of Entities and References The contexts in which character references, entity references, unparsed entity names and notation names may appear: Reference 1.in Content  : … Content 2.in Attribute Value  : 4.in Entity Value  : 5.in DTD [28a] : 3.[Name] Occurs as Attribute Value  :
XML DTD Transparency No. 77 Context in which entity or character references may occur 1. Reference in Content : as a reference in content. EX: He said: &WhatHeSaid; 2. Reference in Attribute Value : as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue. ex: 3. Occurs as Attribute Value: as a Name, not a reference, appearing as the value of an attribute declared as type ENTITY, or ENTITIES or NATATION.
XML DTD Transparency No. 78 4.4 Context in which entity or character references may occur ex: … 4. Reference in Entity Value : as a reference in rule EntityValue. ex: 5. Reference in DTD : as a reference in internal or external subsets of the DTD, but outside of any EntityValue or AttValue. ex: %manyElements;
XML DTD Transparency No. 79 Example : Contexts in which entities or entity references occur ]> … &gEnty; 1 … &ReferenceInContent; 1 …
XML DTD Transparency No. 80 4.4 summary on entities internal v.s. external: internal ==> content given in the declaration external ==> content obtained outside the declaration ex1: ex2: ex3: general v.s. parameter entities: general ==> used in document instance parameter ==> used in document declaration(DTD) ex: ex1==> general; ex2=> PE parsed v.s. unparsed entities: parsed => XML processor will parse it ==> ex1, ex2 unparsed => XML processopr need’t parse it. ==> ex3 note: unparsed entities must be general and external.
XML DTD Transparency No. 81 4.5 Construction of Internal Entity Replacement Text Two forms of the entity's value of an internal entity. literal entity value : the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. replacement text : the content of the entity, after replacement of character references and parameter-entity references. Notes: 1. General-entity references in literal entity value are not expanded to produce replacement text. 2. It is the replacement text of the entity that is substituted for every occurrence of its entity reference.
XML DTD Transparency No. 83 Rules: from internal entity value to replacement text 1.normal character(c matches [^'&"%]or is data '," ) : -|c c-| 2.character reference(Included) : -|xxx; ch(xxxx)-| 3.parameter entity reference (Included in Literal): -|%pe; -|rptxt'(pe) 4.general entity reference (Bypassed): -|≥ ≥-| If -| * -| , then define rptxt( ) = . Notations: ch(xxxx) : char data with code point #xxxx rptxt(entity) : replacement text of ge/pe entity rptxt'(e) ; rptxt(e) with ' and '' treated as normal literals.
XML DTD Transparency No. 84 Contents of entities literal entity valuereplacement text internal parsed (general/paramter) entity quoted string ( ) defined by the rules of EntityValue rptxt( ) external parsed (genral/parameter) entity whole text in the entity same as entity value with optional text declaration stripped.
XML DTD Transparency No. 85 4.4.2 Included An entity is said to be included when its replacement text is retrieved and processed, in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain character data (and markup if it is a general entity), which must be recognized in the usual way, Rules: (note the asymmetry b/t char and non-char inclusion) -| dddd; ch(dddd) -| // char inclusion -| ≥ -| rptxt(ge) // ge inclusion // pe inclusion is not used in xml processing. -| %pe; -| rptxt(pe) // pe inclusion
XML DTD Transparency No. 86 Example Ex: ==>-| &AC; ==>-|The &W3C; Advisory Council ==>The -|&W3C; Advisory Council ==>The -|WWW Consortium Advisory Council ==> So, if then e1 has attribute at1 with value “aaa The WWW Consortium Advisory Councilzzz”.
XML DTD Transparency No. 87 3.3.3 Attribute-value normalization When: after end-of-line processing but before passed to app. 0. End-of-line processing (
) Steps: initially nv=“” // normalized value 1.Repeat until end of input. character reference => append the referenced character to the normalized value (e.g., & ‘&’ ) entity reference => (include it:) recursively apply step 1 to the replacement text of the entity. white space character (#x20, #xD, #xA, #x9) => append a space character (#x20) to the normalized value. O/w (other character ) =>append the character to the normalized value. 2. If not CDATA type => removing leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character Notes : 1. char and entity references are not treated equal. 2. White spaces are normalized to space.
XML DTD Transparency No. 88 Rules: attribute value normalization -- a.k.a from attvalue to normalized attribute value. 1.normal char: (c matches [^'&"<]-S or is data '," ) -| c c -| 2.char reference (included) : -| xxx; ch(xxxx) -| 3.(internal) ge reference (included in literal): -| ≥ -| rptxt'(ge) 4.white space: where w is one of (#x20, #xD, #xA, #x9) and is space -| w -| If -| * -| , then define nv1( ) = . if CDATA nv( ) = nv1( ) O/W nv( ) = nv1( ) but remove leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character.
XML DTD Transparency No. 89 Examples => rptxt(d) = [cr] since -|
[cr]-| => rptxt(a) = [lf]since -|
[lf]-| => rptxt(da) = [cr][lf] since -|
[cr][lf] -| Attribute speca is CDATAa is NMTOKEN(S) att=“ xyz” “[ ][ ]xyz” “xyz” [cr][lf][cr][lf]xyz [lf][lf]xyz xyz xyz EndOfLine-processing normalize non-CDATA type att= &d;&d;A&a;&a;B&da;"“AB”“AB” att= "
" [cr][cr]A[lf][lf] B [cr][lf][cr]A[lf][lf]B [cr][lf]
XML DTD Transparency No. 90 4.4.5 include in literal Same as Included except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. additional rules: -| ‘ ‘ -| -| ” ” -| Example: this is well-formed: while this is not:
XML DTD Transparency No. 91 4.4.8 included as PE same as ‘included’ but the replacement text is enlarged by the attachment of one leading and one following space (#x20) character. rule : -|%pe; -| rptxt(pe) where is space ex: 1. 2. 2. is equ. to. instead of
XML DTD Transparency No. 92 4.4 XML Processor Treatment of Entities and References
XML DTD Transparency No. 93 4.6 Predefined Entities Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "<" and "&" may be used to escape < and & when they occur in character data. 1. // < double escaping required for < and & 2. // & well-formed replacement text 3. // > double escaping harmless but 4. // ‘ not needed for >,' and ". 5. // “ ex: The string "-|AT&T;” ==> "AT-|&T;" ==> “AT&-|T;”. If define 2. as “&" => -|AT&T;” ==> “AT-|&T;” ==> err.
XML DTD Transparency No. 94 From content to next character data in the content 1.normal character(c matches [^&<]) : // after EOL processing -|c c-| 2.character reference(Included) : -|xxx; ch(xxxx)-| 3.(internal or external) general entity reference (Included): -|≥ -|rptxt(ge) 4.begin of markup (end of char data) -|< If -| * -| , or -| then define nxt( ) = . Notation: nxt(cnt) : next char data of the cnt, which is a text satisfies the grammar rule content.
XML DTD Transparency No. 95 4.7 Notation Declarations Notations identify by name the format of unparsed entities e.g., GIF, JPEG, DOC,BMP,… Notation Declarations  NotationDecl ::= ''  PublicID ::= 'PUBLIC' S PubidLiteral 4.8 Document Entity serves as the root of the entity tree and a starting-point for an XML processor. unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.
XML DTD Transparency No. 96 Appendix D. Expansion of Entity and Character References An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&). " > ==> ENTITY example has value(replacement text): An ampersand (&) may be escaped numerically (&) or with a general entity (&). A reference in the document to “&example;” cause the text to be reparsed: ==> An ampersand (&) may be escaped numerically (&) or with a general entity (&).
XML DTD Transparency No. 97 D. More complex example 1 2 6 %xx; 7 ]> 8 This sample shows a &tricky; method. line4 => xx has value “%zz;” line5 => zz has value “ ” line6 => %xx; => %zz; => declared line 8 => element test has content: “This sample shows a error-prone method.”
XML DTD Transparency No. 98 3.4 Conditional Sections Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section  conditionalSect ::= includeSect | ignoreSect  includeSect ::= ''  ignoreSect ::= ''  ignoreSectContents ::= Ignore ('' Ignore)*  Ignore ::= Char* - (Char* (' ') Char*) Note: Nested conditional section allows.
XML DTD Transparency No. 99 3.4 Conditional Sections Example: