1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
An Introduction to XML Based on the W3C XML Recommendations.
XML Schemas Microsoft XML Schemas W3C XML Schemas.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Extensible Markup Language XML MIS 520 – Database Theory Fall 2001 (Day) Lecture 14.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
1 COS 425: Database and Information Management Systems XML and information exchange.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 11 Creating XML Document
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
Tutorial 13 Validating Documents with Schemas
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
+ 1 XML eXtensible Markup Language. + 2 XML Lecture Adapted from the work of Dr. Praveen Madiraju of Marquette University.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Extensible Markup Language
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
Unit 4 Representing Web Data: XML
Tutorial 9 Working with XHTML
XML QUESTIONS AND ANSWERS
Chapter 7 Representing Web Data: XML
Tutorial 9 Working with XHTML
Semi-Structured data (XML)
Presentation transcript:

1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)

3 Why XML ? HTML is to be interpreted by browsers HTML is to be interpreted by browsers Shown on the screen to a human Shown on the screen to a human Desire to separate the “content” from “presentation” Desire to separate the “content” from “presentation” Presentation has to please the human eye Presentation has to please the human eye Content can be interpreted by machines, for machines presentation is a handicap Content can be interpreted by machines, for machines presentation is a handicap Semantic markup of the data Semantic markup of the data

4 Information about a book in HTML Politics of experience by Ronald Laing, published in 1967 Item number: Politics of experience by Ronald Laing, published in 1967 Item number:

5 The same information in XML The same information in XML Politics of experience Politics of experience Ronald Ronald Laing Laing Elements Information is (1) decoupled from presentation, then (2) chopped into smaller pieces, and then (3) marked with semantic meaning It can be processed by machines Like HTML, only syntax, not logical abstract data model

6 XML key concepts Documents Documents Elements Elements Attributes Attributes Namespace declarations Namespace declarations Text Text Comments Comments Processing Instructions Processing Instructions All inherited from SGML, then HTML All inherited from SGML, then HTML

7 The key concepts of XML The key concepts of XML Politics of experience Politics of experience Ronald Ronald Laing Laing Elements Documents Elements Attributes Text Nested structure Conceptual tree Order is important Only “characters”, not integers, etc

8Elements Enclosed in Tags Enclosed in Tags Begin Tag: e.g., Begin Tag: e.g., End Tag: e.g., End Tag: e.g., Element without content: e.g., is a shorthand for Element without content: e.g., is a shorthand for Elements can be nested Wilde Wutz Elements can be nested Wilde Wutz Subelements can implement multisets Subelements can implement multisets Order is important ! Order is important ! Documents must be well-formed is forbidden! is forbidden! Documents must be well-formed is forbidden! is forbidden!

9Attributes Attribute are associated to Elements Attribute are associated to Elements Elements can have only attributes Elements can have only attributes Attribute names must be unique! (No Multisets) is illegal! Attribute names must be unique! (No Multisets) is illegal! What is the difference between a nested element and an attribute? Are attributes useful? What is the difference between a nested element and an attribute? Are attributes useful? Modeling decision: should „name“ be an attribute or a subelement of a person ? What about „age“ ? Modeling decision: should „name“ be an attribute or a subelement of a person ? What about „age“ ?

10 Text and Mixed Content Text appears in element content Text appears in element content The politics of experience The politics of experience Can be mixed with other subelements Can be mixed with other subelements The politics of experience The politics of experience Mixed Content Mixed Content For „documents“ data -- very useful For „documents“ data -- very useful The need does not arise in „data“ processing, only entities and relationships The need does not arise in „data“ processing, only entities and relationships People speak in sentences, not entities and relationships. XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by machines. People speak in sentences, not entities and relationships. XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by machines.

11 Continuous spectrum between natural language, semi-structured data, and structured data 1. Dana said that the book entitled „The politics of experience“ is really excellent ! 2. The book entitled „The politics of experience“ is really excellent ! 2. The book entitled „The politics of experience“ is really excellent ! 3. The book entitled The politics of experience is really excellent ! 3. The book entitled The politics of experience is really excellent ! Dana Dana The politics of experience The politics of experience excellent excellent

12 CDATA sections Sometimes we would like to preserve the original characters, and not interpret them as markup Sometimes we would like to preserve the original characters, and not interpret them as markup CDATA sections CDATA sections Not parsed as XML Not parsed as XML Hello,world! Hello,world! </message> Hello, world! ]]> Hello, world! ]]>

13 Comments, PIs, Prolog Comment: Syntax as in HTML Comment: Syntax as in HTML Processing Instructions Processing Instructions Contain no data - interpretation by processor Contain no data - interpretation by processor Syntax: Syntax: Pause is „Target“; 10secs is „Content“ Pause is „Target“; 10secs is „Content“ XML is a reserved target for prolog XML is a reserved target for prolog Prolog Prolog Standalone defines whether there is a DTD Standalone defines whether there is a DTD Encoding is usually Unicode. Encoding is usually Unicode.

14 Whitespaces declaration Whitespace = Continuous sequence of Space, Tab and Return character Whitespace = Continuous sequence of Space, Tab and Return character Special Attribute xml:space to control use Special Attribute xml:space to control use Human-readible XML (with Whitespace) The politics of experience Ronald laing Human-readible XML (with Whitespace) The politics of experience Ronald laing (Efficient) machine-readible XML (no WS) The politics of experience Ronald Laing (Efficient) machine-readible XML (no WS) The politics of experience Ronald Laing Performance improvement: ca. Factor 2. Performance improvement: ca. Factor 2.

15 Language declaration The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. What colour is it? What colour is it? What color is it? What color is it?

16 Universal Resource Identifiers on the Web URLs, URIs, IRIs URLs, URIs, IRIs URL (Universal Resource Locators): deferenceable identifier on the Web URL (Universal Resource Locators): deferenceable identifier on the Web The target of an URL pointer is an HTML file (virtual or materialized) The target of an URL pointer is an HTML file (virtual or materialized) URIs (Unique Resource Identifier): general purpose key to resources on the Web URIs (Unique Resource Identifier): general purpose key to resources on the Web Uniquely identifies a resource Uniquely identifies a resource Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc) Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc) Lifetime and scope of this “key” is user dependent Lifetime and scope of this “key” is user dependent IRI (Internationalized Resource Identifiers) IRI (Internationalized Resource Identifiers) Allow non Latin characters (Chinese, Arabic, Japanese, etc) Allow non Latin characters (Chinese, Arabic, Japanese, etc) URL, URI, IRIs URL, URI, IRIs All strings All strings Very LONG strings Very LONG strings

17Namespaces Integration of Data from diverse data sources Integration of Data from diverse data sources Integration of different XML Vocabularies (aka Namespaces) Integration of different XML Vocabularies (aka Namespaces) Each „vocabulary“ has a unique key, identified by a URI/IRI Each „vocabulary“ has a unique key, identified by a URI/IRI Same local name, from different vocabularies can have Same local name, from different vocabularies can have Different meaning Different meaning Different structure associated with it Different structure associated with it Qualified Names (Qname) to attach a „name“ to its „vocabulary“ Qualified Names (Qname) to attach a „name“ to its „vocabulary“ for all nodes in an XML document that has names (Attributes, Elements, Pis for all nodes in an XML document that has names (Attributes, Elements, Pis QName ::= triple ( URI [ prefix: ] localname ) QName ::= triple ( URI [ prefix: ] localname ) Binding (prefix, URI) is introduced in elements start tag Binding (prefix, URI) is introduced in elements start tag Later only the prefix is used, not the long URIs Later only the prefix is used, not the long URIs Prefix is optional, default namespaces Prefix is optional, default namespaces Prefix and localname a separated by „:“ Prefix and localname a separated by „:“ „ „

18 Namespaces (cont) Namespace definitions look like Attributes Namespace definitions look like Attributes Identified by „xmlns:prefix“ or „xmlns“ (default) Identified by „xmlns:prefix“ or „xmlns“ (default) Bind the Prefix to the URI Bind the Prefix to the URI Scope is the entire element where the namespace is declared Scope is the entire element where the namespace is declared Includes the element itslef, its attributes and ist subtrees Includes the element itslef, its attributes and ist subtrees Example Example content content

19 Default namespaces Default namespaces, no prefix Default namespaces, no prefix </a> Only applies to subelements, not attributes Only applies to subelements, not attributes </a>

20 Example: Namespaces DQ1 defines dish for china DQ1 defines dish for china Diameter, Volume, Decor,... Diameter, Volume, Decor,... DQ2 defines dish for satellites DQ2 defines dish for satellites Diameter, Frequency Diameter, Frequency How many „dishes“ are there? How many „dishes“ are there? Better ask for: Better ask for: „How many dishes are there?“ or „How many dishes are there?“ or „How many dishes are there?“ „How many dishes are there?“

21 Example: Namespaces Meissner Meissner </gs:dish> MHz MHz </sat:dish>

22 Mixing Several Namespaces <gs:dish xmlns:gs = „ xmlns:uom = „ xmlns:uom = „ Meissner Meissner This is an unqualified element name This is an unqualified element name

23 Example XML data XHTML (browser/presentation) XHTML (browser/presentation) RSS (blogs) RSS (blogs) UBL (Universal Business Language) UBL (Universal Business Language) HealthCare Level 7 (medical data) HealthCare Level 7 (medical data) XBRL (financial data) XBRL (financial data) Digital photography metadata (XMP) Digital photography metadata (XMP) XMI (metadata) XMI (metadata) XQueryX (programs) XQueryX (programs) XForms (forms) XForms (forms) SOAP (message envelopes) SOAP (message envelopes) Microsoft Office -- Powerpoint in XML (documents) Microsoft Office -- Powerpoint in XML (documents)

24 XHTML

25 RSS, blogs XML.com XML.com features a rich mix of information and services for the XML community. XML.com

26 UBL (Universal Business Language) Vocabularies definitions for: Vocabularies definitions for: ApplicationResponseAttachedDocumentBillOfLadin gCatalogueCatalogueDeletionCatalogueItemSpecifi cationUpdateCataloguePricingUpdateCatalogueReq uestCertificateOfOriginCreditNoteDebitNoteDespat chAdviceForwardingInstructionsFreightInvoiceInvoi ceOrderOrderCancellationOrderChangeOrderRes ponseOrderResponseSimplePackingListQuotation ReceiptAdviceReminderRemittanceAdviceRequest ForQuotationSelfBilledCreditNoteSelfBilledInvoiceS tatementTransportationStatusWaybill ApplicationResponseAttachedDocumentBillOfLadin gCatalogueCatalogueDeletionCatalogueItemSpecifi cationUpdateCataloguePricingUpdateCatalogueReq uestCertificateOfOriginCreditNoteDebitNoteDespat chAdviceForwardingInstructionsFreightInvoiceInvoi ceOrderOrderCancellationOrderChangeOrderRes ponseOrderResponseSimplePackingListQuotation ReceiptAdviceReminderRemittanceAdviceRequest ForQuotationSelfBilledCreditNoteSelfBilledInvoiceS tatementTransportationStatusWaybill

27 HealthCareLevel 7 Medical information that is being exchanged between hospitals, patients, doctors, pharmacies and insurance companies Medical information that is being exchanged between hospitals, patients, doctors, pharmacies and insurance companies

28 XBRL (Financial information) Goal: facilitate the exchange of business and financial performance information between companies, governments, insurance companies, banks, etc. Goal: facilitate the exchange of business and financial performance information between companies, governments, insurance companies, banks, etc. Mandate by law in many countries Mandate by law in many countries

29 Extensible Metadata Platform (XMP) Used in PDF, photography and photo editing applications. Used in PDF, photography and photo editing applications.PDFphotographyphoto editingPDFphotographyphoto editing Particular schemas for basic properties useful for recording the history of a resource as it passes through multiple processing steps, from being photographed, scanned, or authored as text, through photo editing steps (such as cropping or color adjustment), to assembly into a final image. Particular schemas for basic properties useful for recording the history of a resource as it passes through multiple processing steps, from being photographed, scanned, or authored as text, through photo editing steps (such as cropping or color adjustment), to assembly into a final image.schemasscannedcroppingschemasscannedcropping XMP allows each software program or device along the way to add its own information to a digital resource, which can then be retained in the final digital file. XMP allows each software program or device along the way to add its own information to a digital resource, which can then be retained in the final digital file. atform atform

30 Microsoft Office in XML Office 2003 was able to import/export all documents into XML Office 2003 was able to import/export all documents into XML Office 2007 models the documents NATIVELY in XML Office 2007 models the documents NATIVELY in XML Examples of vocabularies and schemas: Examples of vocabularies and schemas: WordprocessingML (the XML file format for Word 2003), SpreadsheetML (Excel 2003), FormTemplate XML schemas (InfoPath 2003) and DataDiagramingML (Visio 2003) WordprocessingML (the XML file format for Word 2003), SpreadsheetML (Excel 2003), FormTemplate XML schemas (InfoPath 2003) and DataDiagramingML (Visio 2003)

31 Forms on the Web in XML XML Forms (Xforms) XML Forms (Xforms)

32 Programs and queries in XML XQuery, the XML query language, has an XML representation XQuery, the XML query language, has an XML representation Programs and queries are also DATA Programs and queries are also DATA Blurring the distinction between data, metadata, code Blurring the distinction between data, metadata, code distinct document descendant-or-self author distinct document descendant-or-self author

33 SOAP and Web Services Web Services is the favorite way of exchanging information between applications Web Services is the favorite way of exchanging information between applications XML exchange over HTTP, with a specific protocol (SOAP) XML exchange over HTTP, with a specific protocol (SOAP) uuid:093a2da1-q r-ba5d- pqff98fe8j7d T13:20: :00 Åke Jógvan Øyvind uuid:093a2da1-q r-ba5d- pqff98fe8j7d T13:20: :00 Åke Jógvan Øyvind

34 The need for XML “schemas” Unlike any other data format, XML is totally flexible, elements can be nested in arbitrary ways Unlike any other data format, XML is totally flexible, elements can be nested in arbitrary ways We can start by writing the XML data -- no need for a priori design of a schema We can start by writing the XML data -- no need for a priori design of a schema Think relational databases, or Java classes Think relational databases, or Java classes However, schemas are necessary: However, schemas are necessary: Facilitate the writing of applications that process data Facilitate the writing of applications that process data Constraint the data that is correct for a certain application Constraint the data that is correct for a certain application Have a priori agreements between parties with respect to the data being exchanged Have a priori agreements between parties with respect to the data being exchanged Schema: a model of the data Schema: a model of the data Structural definitions Structural definitions Type definitions Type definitions Defaults Defaults

35 History and role of XML Schema Languages Several standard Schema Languages Several standard Schema Languages DTDs, XML Schema, RelaxNG DTDs, XML Schema, RelaxNG Schema languages have been designed after, and in an orthogonal fashion, to XML itself Schema languages have been designed after, and in an orthogonal fashion, to XML itself Schemas and data are completely decoupled in XML Schemas and data are completely decoupled in XML Data can exist with or without schemas Data can exist with or without schemas Or with multiple schemas Or with multiple schemas Schema evolutions rarely impose evolving the data Schema evolutions rarely impose evolving the data Schemas can be designed before the data, or extracted from the data (DataGuide -- Stanford) Schemas can be designed before the data, or extracted from the data (DataGuide -- Stanford) Makes XML the right choice for manipulating semi- structured data, or rapidly evolving data, or highly customizable data Makes XML the right choice for manipulating semi- structured data, or rapidly evolving data, or highly customizable data

36 DTDs Inherited from SGML Inherited from SGML Part of the original XML 1.0 specification Part of the original XML 1.0 specification Describe the “grammar” of the XML file Describe the “grammar” of the XML file Element declarations: how elements are allowed to nest within each other by rules and constraints Element declarations: how elements are allowed to nest within each other by rules and constraints Attributes lists: describe what attributes are allowed on which element Attributes lists: describe what attributes are allowed on which element Some constraints on the value of elements and attributes Some constraints on the value of elements and attributes Which is the root element of the XML file Which is the root element of the XML file Checking the structural constraints: DTD validation (valid vs. invalid documents) Checking the structural constraints: DTD validation (valid vs. invalid documents) DTD very useful for a while, not used anymore, several major limitations DTD very useful for a while, not used anymore, several major limitations

37 Declaring the structure of elements Grammar that describes the structure of the element Grammar that describes the structure of the element Subelements, identified by Name or Subelements, identified by Name or #PCDATA #PCDATA Combinators : Combinators : „+“ for at least 1 „+“ for at least 1 „*“ for 0 or more „*“ for 0 or more „?“ for 0 or 1 „?“ for 0 or 1 „, „ for concatenation „, „ for concatenation „ | „ for choice „ | „ for choice PCDATA: only textual content allowed PCDATA: only textual content allowed EMPTY : the element must be empty EMPTY : the element must be empty ANY: allows any content ANY: allows any content

38 Example DTD for recipes

39 Defining the attribute lists Structure: Structure: CDATA means normal content CDATA means normal content #REQUIRED, or #IMPLIED refer to the fact that the attribute is optional or not #REQUIRED, or #IMPLIED refer to the fact that the attribute is optional or not Default value possible Default value possible

40 Attributes (cont.) #REQUIRED #REQUIRED Document must specify a value for attribute Document must specify a value for attribute #IMPLIED #IMPLIED Attribute is optional, there is no default Attribute is optional, there is no default value value Default value, if no other value specified Default value, if no other value specified #FIXED value #FIXED value Default value, if no other value specified Default value, if no other value specified If value specified, it must be the fixed value If value specified, it must be the fixed value

41 Major attribute types PCDATA: normal Text content PCDATA: normal Text content ID ID Value is unique within document Value is unique within document Element has at most one attribute of this type Element has at most one attribute of this type No default values allowed No default values allowed IDREF, IDREFS IDREF, IDREFS References to other elements within the document References to other elements within the document IDREFS: Enumeration, „ “ as separator IDREFS: Enumeration, „ “ as separator

42 ID and IDREF attributes

43 Attributes list example <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED> <!ATTLIST nutrition protein CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED calories CDATA #REQUIRED alcohol CDATA #IMPLIED>

44 Mixed content in DTDs Mixing PCDATA declarations with other subelements means that the content can be “mixed” Mixing PCDATA declarations with other subelements means that the content can be “mixed” some text some emphasized text blah some bold text some text some emphasized text blah some bold text

45 Declarations of DTDs No DTD (well-formed Documents) No DTD (well-formed Documents) DTD inside the Document: DTD inside the Document: DTD external, specified by URI: DTD external, specified by URI: DTD external, Name and optional URI: DTD external, Name and optional URI: DTD inside the document + external: DTD inside the document + external:

46 Correctness of XML documents Well formed documents Well formed documents Verify the basic XML constraints, e.g. Verify the basic XML constraints, e.g. Valid documents Valid documents Verify the additional DTD structural constraints Verify the additional DTD structural constraints Non well formed XML documents cannot be processed Non well formed XML documents cannot be processed Non-valid documents can still be processed (queried, transformed, etc) Non-valid documents can still be processed (queried, transformed, etc)

47 Limitations of DTDs DTDs describe only the “grammar” of the XML file, not the detailed structure and/or types DTDs describe only the “grammar” of the XML file, not the detailed structure and/or types This grammatical description has some obvious shortcomings: This grammatical description has some obvious shortcomings: we cannot express that a “length” element must contain a non-negative number (constraints on the type of the value of an element or attribute) we cannot express that a “length” element must contain a non-negative number (constraints on the type of the value of an element or attribute) The “unit” element should only be allowed when “amount” is present (co-occurrence constraints) The “unit” element should only be allowed when “amount” is present (co-occurrence constraints) the “comment” element should be allowed to appear anywhere (schema flexibility) the “comment” element should be allowed to appear anywhere (schema flexibility)

48 Good Schema design principles The XML schema language shall be more expressive than XML DTDs expressed in XML self-describing usable by a wide variety of applications that employ XML straightforwardly usable on the Internet optimized for interoperability simple enough to be implemented with modest design and runtime resources coordinated with relevant W3C specs

49 Recapitulation XML as inheriting from the Web history XML as inheriting from the Web history SGML, HTML, XHTML, XML SGML, HTML, XHTML, XML XML key concepts XML key concepts Documents, elements, attributes, text Documents, elements, attributes, text Order, nested structure, textual information Order, nested structure, textual information Namespaces Namespaces XML usage scenarios XML usage scenarios Financial, medical, metadata, blogs, etc Financial, medical, metadata, blogs, etc DTDs and the need for describing the “structure” of an XML file DTDs and the need for describing the “structure” of an XML file Next: XML Schemas Next: XML Schemas