Presentation is loading. Please wait.

Presentation is loading. Please wait.

XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

Similar presentations

Presentation on theme: "XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)"— Presentation transcript:

1 XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

2 XML DTD Transparency No. 2 Objectives The purpose of using schemas The schema languages DTD XML Schema RELAX NG Regular expressions a commonly used formalism in schema language for defining schema.

3 XML DTD Transparency No. 3 XML Languages and schemas XML language: is a set of XML documents used in a domain and use a common set of elements and attributes. E.g. XHTML, MathML, SVG, CML, RecipeML schema: a formal definition of the syntax of an XML language define the collection of elements that could be used in the language together with all possible attributes and contents of each element. schema language: a notation (or langauge) for writing schemas

4 XML DTD Transparency No. 4 Validation instance document schema processor schema valid invalid normalized instance document error message

5 XML DTD Transparency No. 5 General Requirements for designing a schema language Expressiveness Efficiency Comprehensibility

6 XML DTD Transparency No. 6 Regular Expressions Commonly used in schema languages to describe sequences of characters or elements  : an alphabet (typically Unicode characters or element names)  matches the string   ? matches zero or one   * matches zero or more  ’s  + matches one or more  ’s   matches any concatenation of an  and a   |  matches the union of  and 

7 XML DTD Transparency No. 7 Examples A regular expression describing integers: 0 | -?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)* A regular expression describing the valid contents of table elements in XHTML:

8 XML DTD Transparency No. 8 DTD - Table of Contents Introduction to DTD An introduction to the XML Document Type Definition. DTD - XML Building Blocks What XML building blocks are defined in a DTD. DTD Elements How to define the elements of an XML document using DTD. DTD Attributes How to define the legal attributes of XML elements using DTD. DTD Entities How to define XML entities using DTD.

9 XML DTD Transparency No. 9 DTD – Document Type Definition XML DTD is a subset of the DTD formalism from SGML a part of XML 1.0 A starting point for development of more expressive schema languages Considers elements, attributes, and character data only processing instructions and comments are mostly ignored because they are semantically not part of a document

10 XML DTD Transparency No. 10 Checking Validity with DTD A DTD processor (also called a validating XML parser) parses the input document (includes checking well- formedness) checks the root element name for each element, checks its contents and attributes checks uniqueness and referential constraints ( ID / IDREF ( S ) attributes)

11 XML DTD Transparency No. 11 Internal subset (of a DTD) This is an XML document with a Document Type Definition: Tove Jani Reminder Don't forget me this weekend! The DTD is interpreted like this: !ELEMENT note (in line 3) defines the element "note" as having four elements: "to,from,heading,body". and so on.....

12 XML DTD Transparency No. 12 External subset (of a DTD) This is the same XML document with an external DTD: Tove Jani Reminder Don't forget me this weekend!

13 XML DTD Transparency No. 13 note.dtd This is a copy of the file "note.dtd" containing the Document Type Definition:

14 XML DTD Transparency No. 14 Why use a DTD? A means for people to use a common format for interchanging data. provides an application independent way of sharing data. can use a DTD to verify that the XML document we produced or received from the outside world is valid.

15 XML DTD Transparency No Document Type Declaration (cont’d) Document Type Definition [28] doctypedecl ::= '' [28a] DeclSep ::= PEReference | S [28b] intSubset ::= ( markupdecl | DeclSep )* [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment Notes: 1.DTD = internal subset + External Subset 2.internal subset defined by intSubset; external subset defined by an external entity specified by ExternalID

16 XML DTD Transparency No External Subset [75] ExternalID ::= 'SYSTEM' S SystemLiteral [9] | 'PUBLIC' S PubidLiteral S SystemLiteral External Subset [30] extSubset ::= TextDecl? extSubsetDecl [31] extSubsetDecl ::= ( markupdecl | conditionalSect | DeclSep )* cf.: [28b] intSubset ::= ( markupdecl | DeclSep )*

17 XML DTD Transparency No. 17 Document Type Declarations Associates a DTD schema with the instance document 1. Contains both internal and external subsets External Subset only Internal subset only... system identifier (a URI) public identifier

18 XML DTD Transparency No Example XML documents Hello, world! The system identifier "hello.dtd" gives the URI of a DTD for the document. The declarations can also be given locally, as in this example: Hello, world!

19 XML DTD Transparency No. 19 XML building blocks (content part) (The content parts of) XML documents are made up of the following building blocks: Elements, Tags -- Start Tag, End Tag -- Attributes, PCDATA, CDATA Section Processing Instruction, Comment Entities, Discussed in the previous lecture.

20 XML DTD Transparency No. 20 Entities (reviewed) Entities are used to define common texts like macros in PLs. Entity references are references to entities. format: if xxx is an entity name, then use &xxx; as its entity reference. e.g.; is used to insert an extra space in an HTML document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML: Entity References Character << >> && "" ' ' More about entity later.

21 XML DTD Transparency No. 21 DTD – Element Declaration Declaring an Element which may occur in the document Format: Types of element contents: EMPTY – no contents ANY -- no restriction on contents MIXED-- allow character data (character data only) or (character data + elements) ElementOnly -- allow elements only

22 XML DTD Transparency No. 22 EMPTY element content Declare an element with empty content format: Example: Valid Instances:

23 XML DTD Transparency No. 23 ANY Element content Declare an element that can contain any combination of elements and text data. Declared with the ‘ANY’ keyword : Example: Valid instances (with respective to E1 only): begin middle fff end dddd

24 XML DTD Transparency No. 24 Elements with MIXED contents Two cases: 1. Elements that can only contain text contents 2. Elements allowing text as well as element contents Example: (X) 1. no star 2. #PCDATA placed in wrong position Valid Instances: ddd cd ttt #PCDATA must appear first!.

25 XML DTD Transparency No. 25 Elements that can contain element contents only Issue: how to declare the possible sequences of content elements. Solution: regular expressions over element (names) Definition: 1. CP ::= (name | choice | seq ) (‘+’ | ‘*’ | ‘?’ )? 2. choice ::= a list of two or more CPs separated by ‘|’ and is enclosed by ‘(‘ and ‘)’. 3. seq::= a list of one or more CPs seprated by ‘,’ and is enclosed by ‘(‘ and ‘)’ ElementOnly elements:

26 XML DTD Transparency No. 26 Recursive definition of CP, seq and choice: Basis: if  is a name, then ,  ?,  +,  * are CPs (content particle). --- basic CP Closure: if  is a seq or choice, then ,  ?,  +,  * are CPs. if  1,  2,…  n (n > 1) are CPs, then (  1 |  2 | … |  n ) is a choice. if  1,  2,…  n (n > 0) are CPs, then (  1,  2, …,  n ) is a seq.  is a children if  is a non-basic CP (i.e., a CP but is not a basic CP). Examples of children: Illegal :, Legal :,,

27 XML DTD Transparency No. 27 More examples (X) (0) (x, 1-ambiguous) Rewritten as … (E1, (E2 | (E3,E2)))> (0)

28 XML DTD Transparency No Grammar of Element Type Declaration [45] elementdecl ::= ' ' [ VC: Unique Element Type Declaration]VC: Unique Element Type Declaration [46] contentspec ::= ‘EMPTY’ | ‘ANY’ | Mixed | children Examples:

29 XML DTD Transparency No. 29 Official Grammar of ElementOnly content Models [47] children ::= (choice | seq) ('?' | '*' | '+')? [48] cp ::= (Name | choice | seq) ('?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')' [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Examples: (x) Note: (x) (0) [49,50] has an additional VC: Proper Group/PE Nesting

30 XML DTD Transparency No. 30 Grammar of Mixed Content Mixed-content Declaration [51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')‘ Ex: (x) (0)

31 XML DTD Transparency No. 31 Attribute Definition ELEMENT declarations prescribe each element type that can appear in a document and define the permissible content of each element type. Ex: To prescribe all attributes that can appear in the start tag of an element type, we use ATTLIST declaration.

32 XML DTD Transparency No. 32 ATTLIST declaration To define permissible attributes associated with book element, we use: and 2. can be merged as : 3. Format: Note: Attributes have a name, a type, a default-value and belong to an element.

33 XML DTD Transparency No. 33 Attribute types TypeMeaning CDATAThe value is character data. (v 1 | v 2 | …|v k )The value must be one of the listed name tokens: v 1 …v k. IDThe value is an unique id. IDREFThe value is a reference to an id. IDREFSThe value is a list of IDREFs. NMTOKENThe value is a valid XML name token. NMTOKENSThe value is a list of name tokens. ENTITYThe value is an (unparsed) entity. ENTITIESThe value is a list of (unparsed)entities. NOTATION (v 1 | v 2 | …|v k ) The value must be one of the listed notation names :v 1 …v k.

34 XML DTD Transparency No. 34 Attribute-default value default-valueMeaning “v1” The attribute has a default value v1 its value can be overridden in the doc. #REQUIRED The attribute must be given explicitly in the document. #IMPLIED The attribute does not have to appear in the document. #FIXED “v1” The attribute value is fixed to v1 and could not be overridden in the doc. If specified in doc, its value must be ‘v1’

35 XML DTD Transparency No. 35 Attributes with default value EX1: XML elements: Ex2: Below are equivalent XML Elements: …

36 XML DTD Transparency No. 36 #IMPLIED attribute Syntax: Ex: instance: Both 1 and 2 are valid but they are not equivalent.

37 XML DTD Transparency No. 37 #REQUIRED attribute Syntax: Ex: instances:

38 XML DTD Transparency No. 38 #FIXED “value” attributes Syntax: Ex: Instances: 1. … 2. … 3. … (x) Notes: 1. and 2. are equivalent. 3. is invalid.

39 XML DTD Transparency No. 39 Official Grammar of Attribute-List Declarations Attribute-list Declaration [52] AttlistDecl ::= ' ' [53] AttDef ::= S Name S AttType S DefaultDecl XML attribute types are classified into three kinds: string type (CDATA), enumerated types – name tokens or notations tokenized types (ID, IDREF,IDREFS, NMTOKEN…).

40 XML DTD Transparency No Attribute Types [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS’ | 'ENTITY’ | 'ENTITIES' | 'NMTOKEN’ | 'NMTOKENS’ Notes: ID, IDREF and IDREFS used for cross references ENTITY(S) for referring to external unparsed objects NMTOKEN(S) restrict attribute value to be Nmtoken(s).

41 XML DTD Transparency No. 41 ID and IDREF(S) If an attribute is of ID type, the value of every occurrence of this attribute must be unique among all ID attribute values of the whole document. Ex: Instances: name=“p1” and sid=“p1” violate ID constraint.

42 XML DTD Transparency No. 42 Notation A notation in XML is a name used to identify a specific type of (non-xml) data like ppt, pdf, word, jpeg, gif, etc. Each notation must be declared and is associated with a system identifier and/or public identifier. We may limit the value of an attribute to be a notation name from a list of declared notation names

43 XML DTD Transparency No Enumerated Attribute Types Enumerated Attribute Types [57] EnumeratedType ::= NotationType | Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')’ [58] is used to limit the attribute value to be one of the listed notation names.

44 XML DTD Transparency No. 44 Enumerated attribute values Syntax: Ex: instances:

45 XML DTD Transparency No. 45 NOTATION attribute values Syntax: Ex: Note: Each notation (pdf, ps, gif,…) must be declared in advance using before it can be used. instances:

46 XML DTD Transparency No. 46 Gramamr of Attribute Defaults [60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue) Ex:

47 XML DTD Transparency No. 47 White Space and End-of-line Handling White Space: special attribute xml:space used to indicate if (markup) spaces should be preserved. Every XML document must be normalized for end- of-line before parsing: in order to eliminate difference from different OSs #xD#xA --> #xA // \r\n or \r replaced by \n #D --> #xA // this is done before parsing

48 XML DTD Transparency No Language Identification the preserved attribute xml:lang may be inserted in documents to specify the language used inside an element. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages”. Example:

49 XML DTD Transparency No Language Identifications The quick brown fox jumps over the lazy dog. What colour is it? What color is it? 君不見黃河之水天上來, 奔流到海不復回。 君不見高堂明鏡悲白髮, 朝如青絲暮成雪。 人生得意須盡歡,莫使金樽空對月。 天生我材必有用,千金散盡還復來。 烹羊宰牛且為樂,會須一飲三百杯。岑夫子,丹丘生,將進酒,杯莫停。 與君歌一曲,請君為我傾耳聽。鐘鼓饌玉何足貴,但願長醉不願醒。 古來聖賢皆寂寞,唯有飲者留其名。 陳王昔時宴平樂,斗酒十千恣歡謔。主人何為言少錢,徑須沽取對君酌。 五花馬,千金裘,呼兒將出換美酒,與爾同銷萬古愁。

50 XML DTD Transparency No. 50 君不见黄河之水天上来,奔流到海不复回。 君不见高堂明镜悲白发,朝如青丝暮成雪。 人生得意须尽欢,莫使金樽空对月。 天生我材必有用, 千金散尽还复来。 烹羊宰牛且为乐,会须一饮三百杯。 岑夫子,丹丘生,将进酒,君莫停。 与君歌一曲, 请君为我侧耳听。 钟鼓馔玉不足贵,但愿长醉不愿醒。 古来圣贤皆寂寞,惟有饮者留其名。 陈王昔时宴平乐, 斗酒十千恣欢谑。 主人何为言少钱,径须沽取对君酌。 五花马,千金裘,呼儿将出换美酒,与尔同销万古愁。

51 XML DTD Transparency No. 51 DTD-Entities Entities used to define shortcuts to common text, like macros in programming languages. Entity references are references to entities. If name is an entity [name], then &name; (or %name; but not both) is its reference Entities can be declared internal ( contents in the same doc as its declaration) or external (contents external to its declaration) Two more classifications later.

52 XML DTD Transparency No. 52 Internal Entity Declaration Syntax: DTD Example: XML example: &p1; &birthday; Equivalent to : Peter 2/12/2000

53 XML DTD Transparency No. 53 External Entity Declaration advantage: reuse; more modular Syntax: DTD Example: XML example: &writer;©right;

54 XML DTD Transparency No. 54 Some large DTD Examples XHTML 1.0 XHTML 1.0 DTD DocBook 5.0 DTD docbook.dtd SVG 1.1 svg 1.1 dtd

55 XML DTD Transparency No. 55 Structure of XML Documents Logical Structure Elements Character data Physical Structure Entities Document Unit Sub-unit Document entity External parsed entity External unparsed entity

56 XML DTD Transparency No Physical Structures An XML document may consist of one or many storage units called entities; have content identified by name. may have an associated URI Each XML document has one entity called the document entity, the starting entity for the XML processor and may contain the whole document. the only kind of entities without a name. Entities may be either parsed or unparsed. unparsed --> not to be analyzed to XML processors. used for non-xml data (e.g. image file).

57 XML DTD Transparency No. 57 Properties of an entity entity name: Every entity but the document entity has a name entity reference: if xxx is the name of an entity, then &xxx; (or %xxx;) is its entity reference content: replacement text: the text to be substituted for all occurrences of its reference entity value : the literal value appearing in an entity declaration. Internal or external: external  content from external files internal  content from part of its declaration

58 XML DTD Transparency No. 58 general or parameter general  to be referenced and expanded in document region parameter  to be expanded in DTD region and hence references can appear in DTD region only parsed or unparsed parsed  part of an xml documents unparsed  non-xml data or xml-data but not intended to processed by xml parser. unparsed entities are always external and general. Note: Since unparsed entities must be general and external, there are only 5 kinds of entities.

59 XML DTD Transparency No. 59 Parsed entity and unparsed entity An unparsed entity is a resource whose contents are not to be processed by XML processor. has an associated notation, identified by name. must be an external entity (with publicId and/or SystemId) referenced by [entity] name (instead of entity reference) occurring only in the value of ENTITY or ENTITIES attributes. Parsed entities are entities whose contents need to be processed by XML Processor. referenced by using entity references. contents are referred to as its replacement text;

60 XML DTD Transparency No. 60 Examples external general parsed entity. internal general parsed entity internal parameter parsed entity external general unparsed entity. Note: Notation and unparsed entity are rarely used in practice.

61 XML DTD Transparency No. 61 Example of unparsed entity usage … ]>... … … 。 Type of cover1

62 XML DTD Transparency No. 62 General entity and parameter entity Parameter entities are parsed entities for use in grammar (DTD ). referenced by the form: %name; General entities are entities for use in the document content. sometimes simply called entity. referenced by the form: &name; Comparisons: use different syntax in DTD for definition. use different forms of references recognized in different contexts (grammar v.s. data).

63 XML DTD Transparency No. 63 Examples external general parsed entity. internal general parsed entity internal parameter parsed entity external parameter parsed entity. Notes: All parameter entities are parsed entities Parameter entities carry grammar information. General entities carry data contents.

64 XML DTD Transparency No Character and Entity References A character reference refers to a specific character in the ISO/IEC character set. Character Reference [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

65 XML DTD Transparency No Character and Entity References (cont’d) Entity Reference [67] Reference ::= EntityRef | CharRef [68] EntityRef ::= '&' Name ';' [69] PEReference ::= '%' Name ';’

66 XML DTD Transparency No Entity Declarations Entity Declaration [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= ' ' [72] PEDecl ::= ' ' [73] EntityDef ::= EntityValue [9] | ( ExternalID NDataDecl?) [74] PEDef ::= EntityValue | ExternalID notes: 1. General entities can only be referenced at non-DTD region 2. Parameter entities are referenced at DTD internal entityunparsed entity external entity

67 XML DTD Transparency No. 67 Review of important literals [9] EntityValue ::= ‘”’ ([^%&”] | PEReference | Reference)* ‘”’ | “’” ([^%&'] | PEReference | Reference)* “’” [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] |

68 XML DTD Transparency No Internal Entities Entities defined by EntityValue is called an internal entity. the content of the entity is given in the declaration. no separate physical storage object, Some processing of entity and character references in the literal entity value may be required to produce the correct replacement text. An internal entity is always a parsed entity. Example of an internal entity declaration:

69 XML DTD Transparency No External Entities If the entity is not internal, it is an external entity. External Entity Declaration [75] ExternalID ::= 'SYSTEM' S SystemLiteral [9] | 'PUBLIC' S PubidLiteral S SystemLiteral [76] NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ] If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity. [VC: Notation Declared]: The Name must match the declared name of a notation. SystemLiteral is called the entity’ system identifier, which is a URI. PubidLiteral is called the entity’s public identifier, which the XML processor may use to produce an alternative URI.

70 XML DTD Transparency No. 70 Examples of external entity declaration

71 XML DTD Transparency No Parsed Entities The Text Declaration External parsed entities may each begin with a text declaration. Text Declaration [77] TextDecl ::= ' ' Notes: must be placed at the beginning of an external parsed entity if appearing.

72 XML DTD Transparency No Well-formed Parsed Entities The document entity is well-formed if it matches the production labeled document [1]. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt [78]. All external parameter entities are well-formed by definition. Well-Formed External Parsed Entity [78] extParsedEnt ::= TextDecl? content

73 XML DTD Transparency No Well-Formed Parsed Entities (cont’d) An internal general parsed entity is well-formed if its replacement text matches the production labeled content [43]. content All internal parameter entities are well-formed by definition. A consequence of well-formedness in entities: the logical and physical structures in an XML document are properly nested; i.e., no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another. --- 有始有終

74 XML DTD Transparency No. 74 Examples (x) “ > (x) ” > (x) &e1;&e2;test&e4; &e3;test&e4; test ” > (o) &cnt; ” > (o) (o) &ok1; &ok2;

75 XML DTD Transparency No Character Encoding in Entities External parsed entities may use different encoding for their characters. All XML processors must support UTF-8 and UTF-16. must declare encoding in text declaration for encoding other than UTF-8 or UTF-16. Encoding Declaration [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'"EncName "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */ Examples:

76 XML DTD Transparency No XML Processor Treatment of Entities and References The contexts in which character references, entity references, unparsed entity names and notation names may appear: Reference Content [43] : … Content Attribute Value [10] : Entity Value [9] : DTD [28a] : 3.[Name] Occurs as Attribute Value [10] :

77 XML DTD Transparency No. 77 Context in which entity or character references may occur 1. Reference in Content : as a reference in content. EX: He said: &WhatHeSaid; 2. Reference in Attribute Value : as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue. ex: 3. Occurs as Attribute Value: as a Name, not a reference, appearing as the value of an attribute declared as type ENTITY, or ENTITIES or NATATION.

78 XML DTD Transparency No Context in which entity or character references may occur ex: … 4. Reference in Entity Value : as a reference in rule EntityValue. ex: 5. Reference in DTD : as a reference in internal or external subsets of the DTD, but outside of any EntityValue or AttValue. ex: %manyElements;

79 XML DTD Transparency No. 79 Example : Contexts in which entities or entity references occur ]> … &gEnty; 1 … &ReferenceInContent; 1 …

80 XML DTD Transparency No summary on entities internal v.s. external: internal ==> content given in the declaration external ==> content obtained outside the declaration ex1: ex2: ex3: general v.s. parameter entities: general ==> used in document instance parameter ==> used in document declaration(DTD) ex: ex1==> general; ex2=> PE parsed v.s. unparsed entities: parsed => XML processor will parse it ==> ex1, ex2 unparsed => XML processopr need’t parse it. ==> ex3 note: unparsed entities must be general and external.

81 XML DTD Transparency No Construction of Internal Entity Replacement Text Two forms of the entity's value of an internal entity. literal entity value : the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. replacement text : the content of the entity, after replacement of character references and parameter-entity references. Notes: 1. General-entity references in literal entity value are not expanded to produce replacement text. 2. It is the replacement text of the entity that is substituted for every occurrence of its entity reference.

82 XML DTD Transparency No Example => Entity book has replacement text: “La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights;” Note: No forward reference for PE is permitted. Hence entity ‘book’ could not be put before ‘pub’ entity.

83 XML DTD Transparency No. 83 Rules: from internal entity value to replacement text 1.normal character(c matches [^'&"%]or is data '," ) :  -|c    c-|  2.character reference(Included) :  -|&#xxxx;    ch(xxxx)-|  3.parameter entity reference (Included in Literal):  -|%pe;    -|rptxt'(pe)  4.general entity reference (Bypassed):  -|≥    ≥-|  If -|   *  -| , then define rptxt(  ) = . Notations: ch(xxxx) : char data with code point #xxxx rptxt(entity) : replacement text of ge/pe entity rptxt'(e) ; rptxt(e) with ' and '' treated as normal literals.

84 XML DTD Transparency No. 84 Contents of entities literal entity valuereplacement text internal parsed (general/paramter) entity quoted string (  ) defined by the rules of EntityValue rptxt(  ) external parsed (genral/parameter) entity whole text in the entity same as entity value with optional text declaration stripped.

85 XML DTD Transparency No Included An entity is said to be included when its replacement text is retrieved and processed, in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain character data (and markup if it is a general entity), which must be recognized in the usual way, Rules: (note the asymmetry b/t char and non-char inclusion)  -| &#dddd;    ch(dddd) -|  // char inclusion  -| ≥    -| rptxt(ge)  // ge inclusion // pe inclusion is not used in xml processing.  -| %pe;    -| rptxt(pe)  // pe inclusion

86 XML DTD Transparency No. 86 Example Ex: ==>-| &AC; ==>-|The &W3C; Advisory Council ==>The -|&W3C; Advisory Council ==>The -|WWW Consortium Advisory Council ==> So, if then e1 has attribute at1 with value “aaa The WWW Consortium Advisory Councilzzz”.

87 XML DTD Transparency No Attribute-value normalization When: after end-of-line processing but before passed to app. 0. End-of-line processing (  ) Steps: initially nv=“” // normalized value 1.Repeat until end of input. character reference => append the referenced character to the normalized value (e.g., &  ‘&’ ) entity reference => (include it:) recursively apply step 1 to the replacement text of the entity. white space character (#x20, #xD, #xA, #x9) => append a space character (#x20) to the normalized value. O/w (other character ) =>append the character to the normalized value. 2. If not CDATA type => removing leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character Notes : 1. char and entity references are not treated equal. 2. White spaces are normalized to space.

88 XML DTD Transparency No. 88 Rules: attribute value normalization -- a.k.a from attvalue to normalized attribute value. 1.normal char: (c matches [^'&"<]-S or is data '," )  -| c    c -|  2.char reference (included) :  -| &#xxxx;    ch(xxxx) -|  3.(internal) ge reference (included in literal):  -| ≥    -| rptxt'(ge)  4.white space: where w is one of (#x20, #xD, #xA, #x9) and  is space  -| w    -|  If -|   *  -| , then define nv1(  ) = . if CDATA  nv(  ) = nv1(  ) O/W  nv(  ) = nv1(  ) but remove leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character.

89 XML DTD Transparency No. 89 Examples => rptxt(d) = [cr] since -|  [cr]-| => rptxt(a) = [lf]since -|  [lf]-| => rptxt(da) = [cr][lf] since -|  [cr]-|  [cr][lf] -| Attribute speca is CDATAa is NMTOKEN(S) att=“ xyz” “[ ][ ]xyz” “xyz” [cr][lf][cr][lf]xyz  [lf][lf]xyz  [][]xyz  xyz EndOfLine-processing normalize non-CDATA type att= &d;&d;A&a;&a;B&da;"“[][]A[][]B[][]”“A[]B” att= " A B " [cr][cr]A[lf][lf] B [cr][lf][cr]A[lf][lf]B [cr][lf]

90 XML DTD Transparency No include in literal Same as Included except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. additional rules:  -| ‘    ‘ -|   -| ”    ” -|  Example: this is well-formed: while this is not:

91 XML DTD Transparency No included as PE same as ‘included’ but the replacement text is enlarged by the attachment of one leading and one following space (#x20) character. rule :  -|%pe;    -|  rptxt(pe)  where  is space ex: is equ. to. instead of

92 XML DTD Transparency No XML Processor Treatment of Entities and References

93 XML DTD Transparency No Predefined Entities Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "<" and "&" may be used to escape < and & when they occur in character data. 1. // < double escaping required for < and & 2. // & well-formed replacement text 3. // > double escaping harmless but 4. // ‘ not needed for >,' and ". 5. // “ ex: The string "-|AT&T;” ==> "AT-|&T;" ==> “AT&-|T;”. If define 2. as “&" => -|AT&T;” ==> “AT-|&T;” ==> err.

94 XML DTD Transparency No. 94 From content to next character data in the content 1.normal character(c matches [^&<]) : // after EOL processing  -|c    c-|  2.character reference(Included) :  -|&#xxxx;    ch(xxxx)-|  3.(internal or external) general entity reference (Included):  -|≥    -|rptxt(ge)  4.begin of markup (end of char data)  -|<   If -|   *  -| , or  -|  then define nxt(  ) = . Notation: nxt(cnt) : next char data of the cnt, which is a text satisfies the grammar rule content.

95 XML DTD Transparency No Notation Declarations Notations identify by name the format of unparsed entities e.g., GIF, JPEG, DOC,BMP,… Notation Declarations [82] NotationDecl ::= '' [83] PublicID ::= 'PUBLIC' S PubidLiteral 4.8 Document Entity serves as the root of the entity tree and a starting-point for an XML processor. unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

96 XML DTD Transparency No. 96 Appendix D. Expansion of Entity and Character References An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;). " > ==> ENTITY example has value(replacement text): An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;). A reference in the document to “&example;” cause the text to be reparsed: ==> An ampersand (&) may be escaped numerically (&) or with a general entity (&).

97 XML DTD Transparency No. 97 D. More complex example 1 2 6 %xx; 7 ]> 8 This sample shows a &tricky; method. line4 => xx has value “%zz;” line5 => zz has value “ ” line6 => %xx; => %zz; => declared line 8 => element test has content: “This sample shows a error-prone method.”

98 XML DTD Transparency No Conditional Sections Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section [61] conditionalSect ::= includeSect | ignoreSect [62] includeSect ::= '' [63] ignoreSect ::= '' [64] ignoreSectContents ::= Ignore ('' Ignore)* [65] Ignore ::= Char* - (Char* (' ') Char*) Note: Nested conditional section allows.

99 XML DTD Transparency No Conditional Sections Example:

Download ppt "XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)"

Similar presentations

Ads by Google