Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML for Information Management – Day 4 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

Similar presentations


Presentation on theme: "XML for Information Management – Day 4 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:"— Presentation transcript:

1 XML for Information Management – Day 4 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/ 26.4.-30.4.2010

2 XML for Information Management – Day 4 Airi Salminen 2 1. Entity types 2.Entity declarations and references 3.XML processor treatment of entity references 4.Motivations for the use of entities 5. XML family of languages Outline

3 XML for Information Management – Day 4 Airi Salminen 3 3. Entity types Physical structure of XML documents consists of entities. An entity is a unit recognized by the XML processor, the content of an entity is text or other kind of data.

4 XML for Information Management – Day 4 Airi Salminen 4  parsed entities -- unparsed entities  internal entities -- external entities  general entities -- parameter entities 3-dimensional categorization: 3. Entity types

5 XML for Information Management – Day 4 Airi Salminen 5 parsed entity intended to be parsed by the XML processor, content consists of marked-up text unparsed entity not intended to be parsed by the XML processor, content can be whatever data 3. Entity types

6 XML for Information Management – Day 4 Airi Salminen 6 internal entity name and value given in an entity declaration always a parsed entity external entity not internal parsed or unparsed 3. Entity types

7 XML for Information Management – Day 4 Airi Salminen 7 general entity used in elements and attributes parsed or unparsed internal or external parameter entity used in the document type definition always parsed internal or external 3. Entity types

8 XML for Information Management – Day 4 Airi Salminen 8 Alternatives parsedinternalparameter internalgeneral externalparameter internalgeneral unparsedexternalgeneral 3. Entity types

9 XML for Information Management – Day 4 Airi Salminen 9 root entity, external subset of DTD other files intended for XML processing INPUT FILES for XML processing: UNPARSED ENTITIES: XML processor Information about: application elements and attributes comments processing instructions character data namespaces notations and locations of unparsed entities files not intended for XML processing but referred to by entity references in the INPUT FILES INTERNAL ENTITIES: name and textual content given in DTD 3. Entity types

10 XML for Information Management – Day 4 Airi Salminen 10 4. Entity declarations and references EntityDecl ::= GEDecl | PEDecl GEDecl ::= ' ' PEDecl ::= ' ' EntityDef ::=EntityValue | ( ExternalID NDataDecl?) PEDef ::=EntityValue| ExternalID entity definition for external entityentity definition for internal entity

11 XML for Information Management – Day 4 Airi Salminen 11 internal entity name and value ( = literal value) given nameliteral value 4. Entity declarations and references

12 XML for Information Management – Day 4 Airi Salminen 12 name and system identifier (possibly together with public identifier) given, for an unparsed entity also notation external entity <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"> <!ENTITY % HTMLspecialPUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html Declarations from XHTML specification: <!ENTITY virtuaaliyliopistouutiset SYSTEM " http://virtuaaliyliopisto.jyu.fi/kotisivut/sisalto/etusivu/newsfeed.xml "> 4. Entity declarations and references

13 XML for Information Management – Day 4 Airi Salminen 13 Unparsed entity notation name The notation must have been declared, for example: 4. Entity declarations and references

14 XML for Information Management – Day 4 Airi Salminen 14 References to parameter entities: %Shape; &JY; %HTMLsymbol; &virtuaaliyliopistouutiset; References to parsed general entities: Reference to an unparsed general entity: The type of the attribute has to be ENTITY or ENTITIES 4. Entity declarations and references

15 XML for Information Management – Day 4 Airi Salminen 15 In addition to entity references, XML documents may contain character references. Refers to a specific character of Unicode Provides a decimal or hexadecimal representation of the character’s code point in Unicode " Example: One-character entity defined: 4. Entity declarations and references

16 XML for Information Management – Day 4 Airi Salminen 16 Where an entity or character reference can occur? reference tocan occur in parameter entity‣document type definition parsed general entity‣element content ‣attribute value (either in the start-tag or in the attribute definition) ‣entity value unparsed general entity‣attribute value (either in the start-tag or in the attribute definition) character‣element content ‣attribute value (either in the start-tag or in the attribute definition) ‣entity value 4. Entity declarations and references

17 XML for Information Management – Day 4 Airi Salminen 17 5. XML processor treatment of entity references References to unparsed entities Validating processor makes the identifiers for the entities and associated notations available to the application. Seisoin ikkunassa ja nauroin. Ihana puu. Ihana pesä.

18 XML for Information Management – Day 4 Airi Salminen 18 References to parsed entities Dealing with two kinds of entity values: literal value - the character string written between quotes in the entity definition replacement text - derived by replacing the character references and parameter entity references in the literal value by their character values and replacement texts, respectively. The XML processor replaces the entity reference by its replacement text. 5. XML processor treatment of entity references

19 XML for Information Management – Day 4 Airi Salminen 19 <!ENTITY rhyme1 " Ole aina iloinen niin kuin pikku varpunen "> entity declaration The XML processor is not able to parse this! Problem with the quotes inside the quotes! 5. XML processor treatment of entity references

20 XML for Information Management – Day 4 Airi Salminen 20 <!ENTITY rhyme1 " Ole aina iloinen niin kuin pikku varpunen "> replacement text = literal value entity declaration entity reference &rhyme1; Ole aina iloinen niin kuin pikku varpunen 5. XML processor treatment of entity references

21 XML for Information Management – Day 4 Airi Salminen 21 <!ENTITY rhyme1 " Ole aina iloinen niin kuin pikku varpunen "> replacement text entity declaration with character references entity reference &rhyme1; Ole aina iloinen niin kuin pikku varpunen 5. XML processor treatment of entity references literal value Ole aina iloinen niin kuin pikku varpunen

22 XML for Information Management – Day 4 Airi Salminen 22 <!ENTITY % coreattrs "idID#IMPLIED classCDATA#IMPLIED style%StyleSheet;#IMPLIED title%Text;#IMPLIED"> http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html Declarations from XHTML specification: literal value of coreattrs:idID#IMPLIED classCDATA#IMPLIED style%StyleSheet;#IMPLIED title%Text;#IMPLIED replacement text of coreattrs:idID#IMPLIED classCDATA#IMPLIED styleCDATA#IMPLIED titleCDATA#IMPLIED 5. XML processor treatment of entity references

23 XML for Information Management – Day 4 Airi Salminen 23 Exercise Entity declaration from XHTML Strict-DTD: What is the (a) literal value (b) replacement text of entity Block (a) literal value: (%block; | form | %misc; )* 5. XML processor treatment of entity references

24 XML for Information Management – Day 4 Airi Salminen 24 <!ENTITY % block " p | %heading; | div | %lists; | %blocktext; | fieldset | table " > http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html Declarations from XHTML specification: Other entity declarations needed from the DTD: 5. XML processor treatment of entity references

25 XML for Information Management – Day 4 Airi Salminen 25 Deriving the replacement text of Block : references to parameter entities in the literal value (%block; | form | %misc;)* replaced by their replacement texts. p | %heading; | div | %lists; | %blocktext; | fieldset | table Literal value of block : Replacement text of block : p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table Literal value of misc : noscript | %misc.inline; Replacement text of misc : noscript | ins | del | script Replacement text of Block : (p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table | form | noscript | ins | del | script )* 5. XML processor treatment of entity references

26 XML for Information Management – Day 4 Airi Salminen 26 6. Motivations for the use of entities use of non-textual data (audio, graphics, etc.) in XML documents (but can be added also in stylesheets) modularization of documents consistency multiuse of definitions adding semantic information by informative entity names and comments attached to entity declarations The use of entities supports:

27 XML for Information Management – Day 4 Airi Salminen 27 5. XML family of languages Specification of XML 1.0 was just the first step in the development of languages for the management of data on the Web. ‣W3C (World Wide Web Consortium) developes specifications to support the use of the web, the specifications are publicly available at http://www.w3.org/TR/ http://www.w3.org/TR/ ‣Development is systematic ‣Development process is specified and publishedDevelopment process

28 XML for Information Management – Day 4 Airi Salminen 28 ‣Working Draft: represents work in progress. ‣Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback. ‣Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review. ‣Recommendation: represents consensus within W3C, widespread implementation encouraged. Phases of the W3C development process 5. XML family of languages

29 XML for Information Management – Day 4 Airi Salminen 29 XML family = XML + XML-related languages A. Salminen, XML Family of Languages. Overview and Classification. http://users.jyu.fi/~airi/xmlfamily.html 5. XML family of languages

30 XML for Information Management – Day 4 Airi Salminen 30 XML-related languages fall into the following categories:  XML accessory: intended for wide use to extend the capabilites of XML  XML transducer: intended for transducing some input XML data into some output form  XML application: intended for some special application domain, defines constraints for XML data on the domain 5. XML family of languages

31 XML for Information Management – Day 4 Airi Salminen 31  additional rules extending the capabilities specified in XML  intended for wide use  development primarily at W3C  for realizing the modularization principle of W3C: keep XML itself small and as stable as possible  most important: XML Names, XML Schema, XPath, XLink XML Accessory 5. XML family of languages

32 XML for Information Management – Day 4 Airi Salminen 32 5. XML family of languages W3C Recommendations for XML Accessories: LanguagePurposeRecommendation XML NamesQualifying element and attribute names1999, 2004, 2006, 2009 XML StylesheetAssociating style sheets with an XML document1999 XPathAddressing parts of XML documents1999, 2007 XML SchemaConstraining a class of XML documents2001, 2004 XLinkTo create and describe links2001 XML BaseA base URI service2001 XPointerFragment identifiers especially for URI references2003 xml:idAttribute xml:id in XML documents2005 ITSMechanism to support internationalization and localization of content 2007

33 XML for Information Management – Day 4 Airi Salminen 33  To convert XML input data (a document, part of document, a set of documents) into output  Associated with a processing model  Active development at W3C  most important: CSS, XSL, XSLT, XQuery XML Transducer 5. XML family of languages

34 XML for Information Management – Day 4 Airi Salminen 34 5. XML family of languages W3C Recommendations for XML Transducers: LanguagePurposeRecommendation CSSRendering(1996), 1998 XSLTTransformation1999, 2007 Canonical XMLCanonicalization2001, 2002 XSLRendering2001, 2006 XIncludeMerging2004, 2006 XQueryQuerying2007

35 XML for Information Management – Day 4 Airi Salminen 35  Defines constraints for a class of XML data on a particular application domain  Usually defined by a DTD or some other schema language  development work both at W3C and outside  Examples from W3C: SMIL, RDF, XHTML XML Application 5. XML family of languages

36 XML for Information Management – Day 4 Airi Salminen 36 Non-textual Data Web Publishing Metadata and Semantic Web Web Communication and Services 5. XML family of languages XML Applications developed at W3C for:

37 XML for Information Management – Day 4 Airi Salminen 37 5. XML family of languages W3C Recommendations for non-textual data: LanguagePurposeRecommendation SMIL (Syncronized Multimedia Integration Language) Integrating a set of independent multimedia objects into a syncronized multimedia presentation 1998, 2001, 2005 MathML (Mathematical Markup Language) Mathematical notation, especially for eabling encoding mathematical material for the Web 1999, 2001, 2003 Ruby AnnotationMarkup for ruby, short annotations alongside the base text typically used in East Asian documents 2001 SMIL AnimationAnimation functionality in XML documents2001 SVGTo describe two-dimensional vector and mixed vector/raster graphic 2001, 2003 VoiceXML (Voice Extensible Markup Language) To describe audio dialogs and thus support interactive voice response applications on the Web 2004, 2007 SSML (Speech Synthesis Markup Languages) To assist generation of synthetic speech in Web and other applications 2004 EMMA (Extensible MultiModal Annotation markup language) To enable Web access using multimodal interfaces 2009

38 XML for Information Management – Day 4 Airi Salminen 38 5. XML family of languages W3C Recommendations for Web publishing: LanguagePurposeRecommendation XHTMLReformulation of HTML 4.0 in XML specified by three document types: Strict, Transitional, Frameset 1999, 2000, 2002 XHTML ModularizationDefining XHTML elements and attributes in a set of modules 2001 XHTML BasicThe minimal core of XHTML2000 XML EventsTo represent asynchronous occurrences, such as mouse clicks, in XHTML or in other XML markup 2003 XFormsFor Web forms allowing online interaction between human users and software, to be used in XHTML or in other XML markup 2003, 2006 XHTML-PrintSimple XHTML suitable for printing from mobile devices as well as for display 2006

39 XML for Information Management – Day 4 Airi Salminen 39 5. XML family of languages W3C Recommendations for Semantic Web: LanguagePurposeRecommendation RDF (Resource Description Framework) A model and XML-based language for metadata describing Web resources 1999, 2004 RDF SchemaTo define RDF vocabularies2004 OWL (Web Ontology Language)Publishing and sharing ontologies2004 WebCGM XCFMetadata for WebCGM pictures2007 GRDDL (Gleaning Resource Descriptions from Dialects of Languages) Markup for declaring that an XML document includes RDF compatible data 2007 SPARQLQuery language for RDF2008 POWDERMetadata to describe a group of resources2009

40 XML for Information Management – Day 4 Airi Salminen 40 5. XML family of languages W3C Recommendations for Web communication and services: LanguagePurposeRecommendation P3P (Platform for Privacy Preferences) To enable Web sites to express their practices to collect and use data collected from users of sites 2002 XML-SignatureAssociating digital objects by digital signatures in XML format 2002 XML EncryptionEncrypting data and representing the result in XML2002 SOAP (Simple Object Access Protocol) Rules to exchange structured and typed information between peers in a decentralized, distributed environment 2003, 2007 CC/PP (Composite Capabilities/Preference Profiles) A format for how a client device tells an origin server about its user agent profile 2004 XKMS (XML Key Management Specification) Protocol for distributing and registering public keys2005 WSDL (Web Services Description Language) To describe Web services2007 SMLService modeling2009

41 XML for Information Management – Day 4 Airi Salminen 41 A. Salminen, XML Family of Languages. Overview and Classification. http://users.jyu.fi/~airi/xmlfamily.html For more information: 1. XML family of languages


Download ppt "XML for Information Management – Day 4 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:"

Similar presentations


Ads by Google