Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen XML for Information Management University of Erlangen-Nuremberg.

Similar presentations


Presentation on theme: "XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen XML for Information Management University of Erlangen-Nuremberg."— Presentation transcript:

1 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/ 12.1.-16.1. 2009

2 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 2 Day 4: Logical and Physical Structure of XML Documents 1. Components of the logical structure 2. XML documents as trees 3.Entity types 4.Entity declarations and references 5.XML processor treatment of entity references 6.Motivations for the use of entities Outline

3 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 3 1. Components of the logical structure declarations elements comments processing instructions

4 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 4 1. Components of the logical structure document ::= prolog element Misc* declarations comments processing instructions elements comments processing instructions comments processing instructions

5 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 5 ‣XML declaration [23]23 ‣document type declaration [28]28 ‣markup declaration [29]29 element type declaration [45]45 attribute list declaration [52]52 entity declaration [70]70 notation declaration [82]82 ‣encoding declaration [80]80 ‣standalone document declaration [32]32 ‣text declaration [77]77 Declarations: 1. Components of the logical structure to constrain the logical structure to constrain the physical structure

6 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 6 Typical element type declarations: 1. Components of the logical structure mixed content defined element content defined empty element defined

7 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 7 1. Components of the logical structure empty element defined: two forms of the element allowed in a well-formed document:

8 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 8 1. Components of the logical structure element content: definition by content models with metasymbols * iteration (none or more) + iteration (once or more) | alternatives ?optional,successive ( )grouping #PCDATA is not accepted in the content model! <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> Example from XHTML 1.0 Strict DTD:XHTML 1.0 Strict DTD

9 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 9 1. Components of the logical structure mixed content: definition has basically two forms (#PCDATA) (#PCDATA | e1 | … | en)* #PCDATA is always included in the content specification and comes first in the list of alternatives examples:

10 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 10 to define the set of attributes pertaining to a given elemen type to establish type constraints for these attributes to provide default values for attributes Attribute list declarations 1. Components of the logical structure

11 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 11 attribute name attribute type: string constraint: the attribute must be specified for all elements of type poem element type 1. Components of the logical structure

12 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 12 Defining constraints #REQUIRED : attribute must always be provided in all elements of the given type #IMPLIED : attribute can be provided in a element; no default value is provided AttValue: default value is given between single or double quotes #FIXED AttValue: instances of the attribute must match the given default value [60] DefaultDecl ::= '#REQUIRED' |60 '#IMPLIED'| (('#FIXED' S) ? AttValue) 1. Components of the logical structure

13 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 13 Attribute types [54] AttType ::= StringType | TokenizedType | EnumeratedType54 ENTITY, ENTITIES: entity names NMTOKEN, NMTOKENS: text tokens consisting of characters accepted in names ID: names that uniquely identify elements IDREF, IDREFS: references to ID type identifiers tokenized types: enumerated types: NOTATION, NOTATIONS: identify notations enumeration 1. Components of the logical structure

14 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 14 <!DOCTYPE text [ <!ATTLISTline idID #REQUIRED seelineIDREFS#IMPLIED> ]> This is the first line This is the second line, but look at the first too 1. Components of the logical structure

15 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 15 2. XML documents as trees She smelled like trees. XML-aware web browsers support the visualization of the hierarchic structure: exampleexample

16 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 16 2. XML documents as trees XML specification defines a concrete syntax for XML documents. W3C has defined four slightly different abstract models to decribe the abstract syntax of XML documents: XML Information Set DOM model XPath 1.0 model XQuery 1.0 and XPath 2.0 data model Analysis of differences in the models: Salminen, A., & Tompa, F.W. (2001). Requirements for XML document database systems. Proc. of the ACM Symposium on Document Engineering (DocEng '01) (pp. 85-94). New York: ACM Press.

17 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 17 This life of ours would not cause you sorrow if you thought of it as like the mountain cherry blossoms which bloom and fade in a day. 2. XML documents as trees

18 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 18 poem line Author Murasaki Shikibu line born 974 This life of ours would not cause you sorrow if you thought of it as like which bloom and fade in a day. the mountain cherry blossoms Root node Element node Attribute node The poem is translated from Japanese by Kenneth Rexroth Text node Comment node poem 2. XML documents as trees Node types of XPath 1.0

19 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 19 3. Entity types Physical structure of XML documents consists of entities. An entity is a unit recognized by the XML processor, the content of an entity is text or other kind of data.

20 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 20  parsed entities -- unparsed entities  internal entities -- external entities  general entities -- parameter entities 3-dimensional categorization: 3. Entity types

21 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 21 parsed entity intended to be parsed by the XML processor, content consists of marked-up text unparsed entity not intended to be parsed by the XML processor, content can be whatever data 3. Entity types

22 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 22 internal entity name and value given in an entity declaration always a parsed entity external entity not internal parsed or unparsed 3. Entity types

23 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 23 general entity used in elements and attributes parsed or unparsed internal or external parameter entity used in the document type definition always parsed internal or external 3. Entity types

24 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 24 Alternatives parsedinternalparameter internalgeneral externalparameter internalgeneral unparsedexternalgeneral 3. Entity types

25 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 25 root entity, external subset of DTD other files intended for XML processing INPUT FILES for XML processing: UNPARSED ENTITIES: XML processor Information about: application elements and attributes comments processing instructions character data namespaces notations and locations of unparsed entities files not intended for XML processing but referred to by entity references in the INPUT FILES INTERNAL ENTITIES: name and textual content given in DTD 3. Entity types

26 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 26 4. Entity declarations and references EntityDecl ::= GEDecl | PEDecl GEDecl ::= ' ' PEDecl ::= ' ' EntityDef ::=EntityValue | ( ExternalID NDataDecl?) PEDef ::=EntityValue| ExternalID entity definition for external entityentity definition for internal entity

27 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 27 internal entity name and value ( = literal value) given nameliteral value 4. Entity declarations and references

28 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 28 name and system identifier (possibly together with public identifier) given, for an unparsed entity also notation external entity <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"> <!ENTITY % HTMLspecialPUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html Declarations from XHTML specification: <!ENTITY virtuaaliyliopistouutiset SYSTEM " http://virtuaaliyliopisto.jyu.fi/kotisivut/sisalto/etusivu/newsfeed.xml "> 4. Entity declarations and references

29 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 29 Unparsed entity notation name The notation must have been declared, for example: 4. Entity declarations and references

30 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 30 References to parameter entities: %Shape; &JY; %HTMLsymbol; &virtuaaliyliopistouutiset; References to parsed general entities: Reference to an unparsed general entity: The type of the attribute has to be ENTITY or ENTITIES 4. Entity declarations and references

31 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 31 In addition to entity references, XML documents may contain character references. Refers to a specific character of Unicode Provides a decimal or hexadecimal representation of the character’s code point in Unicode " Example: One-character entity defined: 4. Entity declarations and references

32 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 32 Where an entity or character reference can occur? reference tocan occur in parameter entity‣document type definition parsed general entity‣element content ‣attribute value (either in the start-tag or in the attribute definition) ‣entity value unparsed general entity‣attribute value (either in the start-tag or in the attribute definition) character‣element content ‣attribute value (either in the start-tag or in the attribute definition) ‣entity value 4. Entity declarations and references

33 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 33 5. XML processor treatment of entity references References to unparsed entities Validating processor makes the identifiers for the entities and associated notations available to the application. Seisoin ikkunassa ja nauroin. Ihana puu. Ihana pesä.

34 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 34 References to parsed entities Dealing with two kinds of entity values: literal value - the character string written between quotes in the entity definition replacement text - derived by replacing the character references and parameter entity references in the literal value by their character values and replacement texts, respectively. The XML processor replaces the entity reference by its replacement text. 5. XML processor treatment of entity references

35 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 35 <!ENTITY rhyme1 " Ole aina iloinen niin kuin pikku varpunen "> replacement text = literal value entity declaration entity reference &rhyme1; Ole aina iloinen niin kuin pikku varpunen 5. XML processor treatment of entity references

36 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 36 <!ENTITY % coreattrs ”idID#IMPLIED classCDATA#IMPLIED style%StyleSheet;#IMPLIED title%Text;#IMPLIED”> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html Declarations from XHTML specification: literal value of coreattrs:idID#IMPLIED classCDATA#IMPLIED style%StyleSheet;#IMPLIED title%Text;#IMPLIED replacement text of coreattrs:idID#IMPLIED classCDATA#IMPLIED styleCDATA#IMPLIED titleCDATA#IMPLIED 5. XML processor treatment of entity references

37 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 37 Exercise 10 (Course Text, Chapter 5) Entity declaration from XHTML Strict-DTD: What is the (a) literal value (b) replacement text of entity Block (a) literal value: (%block; | form | %misc; )* 5. XML processor treatment of entity references

38 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 38 <!ENTITY % block ”p | %heading; | div | %lists; | %blocktext; | fieldset | table”> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html Declarations from XHTML specification: Other entity declarations needed from the DTD: 5. XML processor treatment of entity references

39 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 39 Deriving the replacement text of Block : references to parameter entities in the literal value (%block; | form | %misc;)* replaced by their replacement texts. p | %heading; | div | %lists; | %blocktext; | fieldset | table Literal value of block : Replacement text of block : p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table Literal value of misc : noscript | %misc.inline; Replacement text of misc : noscript | ins | del | script Replacement text of Block : (p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table | form | noscript | ins | del | script )* 5. XML processor treatment of entity references

40 XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen 40 6. Motivations for the use of entities use of non-textual data (audio, graphics, etc.) in XML documents (but can be added also in stylesheets) modularization of documents consistency multiuse of definitions adding semantic information by informative entity names and comments attached to entity declarations The use of entities supports:


Download ppt "XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen XML for Information Management University of Erlangen-Nuremberg."

Similar presentations


Ads by Google