More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable.

Slides:



Advertisements
Similar presentations
XML I.
Advertisements

Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Week 4 Document Type Definition (DTD)
Physical and Logical Structure
1 Print your own copy If you bring it along, hand in with your exam script Do not write anything extra or you will be penalized Student Name: Student Number:
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
Introduction to XML This material is based heavily on the tutorial by the same name at
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Creating Document Type Definitions (DTDs) Ellen Pearlman Eileen Mullin.
Copyright © 2003 Pearson Education, Inc. Slide 2-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
SNU OOPSLA Lab. XML Documents 1 : Structure The ubiquitous XML(2) © copyright 2001 SNU OOPSLA Lab.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 4. Document Type Definitions (DTDs)
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1/11 ITApplications XML Module Session 3: Document Type Definition (DTD) Part 1.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
Document Type Definitions (DTD) A Document Type Definition (DTD) defines the structure and the legal elements and attributes of an XML document. A DTD.
Beginning XML 3 rd Edition. Chapter 4: Document Type Definitions.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
XML DTD. XML Validation XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid" XML.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Document Type Definition DTDs
Session III Chapter 6 – Creating DTDs
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
XML IST 421.
Presentation transcript:

More xml chpt 6 DTD Document Type Definition

DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable elements and attributes for an XML document. There is a move away from DTD currently, toward Schema. Schema documents have XML (not BNF) syntax. Some parsers can check an XML document against its DTD and determine if it is valid. These are called validating parsers. A document which is syntactically correct but does not correspond to its DTD is well-formed. Non-validating parsers can’t check documents against their DTD and can thus only determine if the document is well- formed.

Document Type Declaration in an XML document prolog is used to specify DTD appearing within or outside the document. These are referred to as the internal or external subset. <DOCTYPE thingy [ ]> Declares a dtd called thingy with one element in the internal subset. PCDATA refers to “parseable character data” meaning reserved characters and & within the PCDATA will be treated as markup. The parentheses contain the content specification for the element.

MS XML validator We can check an xml document for adherence to an external DTD using MS XML validator. Here’s the xml: Welcome to XML! And here’s the DTD:

MS Validating parser can validate against schema or dtd

Invalid xml In the next slide we use the MS XML validator to check an xml (appearing below) like intro.xml but missing the message element:

If xml doc does not match dtd/schema

Sequences, pipes and occurrences The comma can be used to indicate a sequence in which elements must appear. Indicates the order and number of elements making up a class: one prof and one student, in that order. Content may specify any number of elements. Indicates just one of the choices must be selected. +, *, and ? Indicate frequency of element occurrences. + means 1 or more occurences, * means 0 or more occurences, ? Means 0 or 1 occurrence. Might be appropriate for a class DTD meaning just one professor and one or more students.

example Specifies donuts consists of 0 or 1 jelly, 0 or more lemon, 1 or more of crème or sugar, or a glazed. A legal markup for this would be grape sour real sour chocolate

The dtd and xml Pastry.dtd: Pastry.xml grape sour real sour chocolate

In validator: files are in myexamples directory

Pastry.xml in xml validator

content specification An element may contain one or more child elements as content. Content specification types describe non-element content. Theses consist of ANY, EMPTY and mixed content. Empty elements do not contain character data or child elements. An empty element specification like could be marked up as. Recall the shorthand /> may be used for an empty element closetag., + and * can’t be used with mixed content elements containing only PCDATA. If mixed content may contain PCDATA, then this must be listed first. An element of type ANY may contain any content including PCDATA, or combinations of elements and PCDATA. They may also be empty.

Mixed content Declares mymessage to have mixed content. PCDATA must be listed first in mixed content. * means mymessage may contain nothing or any number of occurences of PCDATA and message elements. This would be legal markup: here is an example of the dtd above this is a message and another

Internal dtd An xml document is standalone if it does not reference an external subset. <!DOCTYPE format [ ]> This is a simple formatted sentence. I have tried bold. I have tried italic. Now what?

In ms xml validator

Element group Above, a courselist contains a single department followed by any number of coursenumber, coursedescription pairs. What does the following mean?

Attribute specification An attribute specification specifies an attribute list for an element via ATTLIST declaration: Here, y is a required attribute of element x. y may contain any char data (except, ‘, “ and &). CDATA in an attribute declaration has different meaning than a CDATA section in an XML document where ]]> (end tag) may not appear.

Using attributes <!DOCTYPE myMessage [ ]> Welcome to XML!

Document with attributes in MS validator

Attribute defaults Page authors can specify default values for attributes. The keywords are #IMPLIED, #REQUIRED and #FIXED. –An implied attribute, if missing, can be replaced by any value the application using the document wishes. –A required attribute must appear or the document is not valid. –A fixed attribute must have the specific value provided. number does not conform to specifies that zip can only have value “13820” and an application processing an XML document with address element missing attribute zip would be passed this default zip value.

Attributes Attribute types may be CDATA (Strings), tokenized or enumerated. Strings have no constraints beyond prohibiting,&,’,and “. Entity references must be used for these. Tokenization imposes constraints on attribute values such as which characters are permitted in an attribute name. An enumerated attribute has a restricted value range: It can only take on one of the values listed in the attribute declaration.

tokenized attribute 4 tokenized types exist: –ID –IDREF –ENTITY –NMTOKEN ID uniquely identifies an element. IDREF attributes point to elements with ID attribute. A validating parser verifies that each ID attribute type referenced by an IDREF is in the document. Using the same value for multiple ID attributes is an error. Declaring attributes of type ID to be #FIXED is an error.

Using ID and IDREF attributes <!DOCTYPE bookstore [ ]> 2 to 4 days 1 day Java How to Program 3rd edition. C How to Program 3rd edition. C++ How to Program 3rd edition.

In MS Validator Use URL: 20programming/validate_js.htm 20programming/validate_js.htm with file examples\ch06\IDExample.xml

ID example

id example: internal subset <!DOCTYPE bookstore [ ]>

Idexample.xml continued 2 to 4 days 1 day Java How to Program 3rd edition. C How to Program 3rd edition. C++ How to Program 3rd edition.

remarks It is an error not to begin a type attribute ID’s value with a letter, underscore or colon. Providing more than one ID attribute type for an element is an error. Referencing a value as an ID is not defined is an error.

IDExample2.xml (note s3 shippedBy value) 2 to 4 days 1 day Java How to Program 3rd edition. C How to Program 3rd edition. C++ How to Program 3rd edition.

IDExample2.xml in Validator

Entities As we saw in chapter 5 entity references in an xml document are replaced by the entity values found in the dtd. We saw this for lang.xml and lang.dtd where assoc and text entities were replaced with Arabic script. Here is another example. Entity city is replaced.

entityexample.xml <!DOCTYPE database [ ]> Deitel & Associates, Inc.

entityexample.xml

Here line 7 <NOTATIO… indicates that an application may wish to run IE and load tour.html to handle unparsed entities. line 8 declares an entity named city which refers to the external document tour.html. NDATA in this line indicates that the content of this entity is not xml and supplies the name of the notation (html) for this entity.

ENTITIES ENTITIES keyword can be used in a dtd to indicate that an attribute has mutliple entities for its value. Specifies that file must contain multiple entities. Conforming markup is animations, graphics and tables are entities declared in a dtd. NMTOKEN type is more restrictive, containing letters, digits, periods, underscores, hyphens and colons. might have conforming markup does not conform because spaces are not allowed. NMTOKENS attribute type would allow multiple string tokens separated by blanks.

Enumerated attribute types Enumerated attribute type declares a list of possible values. Attributes must be assigned a value from this list in order to conform to the dtd. Enumerated values are separated with pipe (|) allows a person to have gender M or F with default “F”. does not supply a default and would permit an application to process a person with no gender in whatever way it liked.

Enumerated attribute types NOTATION is also an enumerated attribute type. Specifies that language must be assigned a value, Java or C with C as the default. The notation for C might be specified as

conditional.xml Conditional sections provide the flexibility of including or excluding declarations. These enable us to check xml documents against different sets of dtd requirements. Keywords INCLUDE and IGNORE specify included and excluded declarations: <![INCLUDE[ ]]> Directs the parser to include the declaration of element name. Conditionals may also be used with entities.

Conditional.dtd <![ %accept; [ ]]> <![ %reject; [ ]]>

Conditional.xml Chairman

discussion Entities %accept and %reject have values “IGNORE” and “INCLUDE”. The percent symbol indicates that they are parameter entities and may only be used inside the dtd in which they are declared. They may only appear in the external subset. Thus the author may create entities specific to the dtd – not xml – document.

conditional.xml

Chairman conditional.xml

conditional.dtd <![ %accept; [ ]]> <![ %reject; [ ]]>

Whitespace Whitespace is preserved or normalized depending on the context in which it appears. A text example (whitespace.xml) uses a java program (Tree.java from chapt 9) to demonstrate when whitespace is preserved or normalized. File can be got from classdir\examples\ch09\tree.java

running Tree.java on whitespace.xml... java src in notes C:\Java\j2sdk1.4.1_01\bin>java Tree yes whitespace.xml URL: file:C:/Java/j2sdk1.4.1_01/bin/whitespace.xml [ document root ] +-[ element : whitespace ] +-[ ignorable ] +-[ element : hasCDATA ] +-[ attribute : cdata ] " simple cdata " +-[ ignorable ] +-[ element : hasID ] +-[ attribute : id ] "i20" +-[ ignorable ] +-[ element : hasNMTOKEN ] +-[ attribute : nmtoken ] "hello" +-[ ignorable ]

Java tree output continued +-[ element : hasEnumeration ] +-[ attribute : enumeration ] "true" +-[ ignorable ] +-[ element : hasMixed ] +-[ text ] " " +-[ text ] " This is text." +-[ text ] " " +-[ text ] " " +-[ element : hasCDATA ] +-[ attribute : cdata ] " simple cdata" +-[ text ] " " +-[ text ] " This is some additional text." +-[ text ] " " +-[ text ] " " +-[ ignorable ] [ document end ] C:\Java\j2sdk1.4.1_01\bin>

whitespace.xml: dtd and content <!DOCTYPE whitespace [ <!ELEMENT whitespace ( hasCDATA, hasID, hasNMTOKEN, hasEnumeration, hasMixed )> <!ATTLIST hasEnumeration enumeration ( true | false ) #REQUIRED> ]>

whitespace.xml continued This is text. This is some additional text.

Tree.java slide 1 import java.io.*; import org.xml.sax.*; // for HandlerBase class import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class Tree extends HandlerBase { private int indent = 0; // indentation counter // returns the spaces needed for indenting private String spacer( int count ) { String temp = ""; for ( int i = 0; i < count; i++ ) temp += " "; return temp; } // method called before parsing // it provides the document location public void setDocumentLocator( Locator loc ) { System.out.println( "URL: " + loc.getSystemId() ); }

Tree.java slide 2 // method called at the beginning of a document public void startDocument() throws SAXException { System.out.println( "[ document root ]" ); } // method called at the end of the document public void endDocument() throws SAXException { System.out.println( "[ document end ]" ); } // method called at the start tag of an element public void startElement( String name, AttributeList attributes ) throws SAXException { System.out.println( spacer( indent++ ) + "+-[ element : " + name + " ]"); if ( attributes != null ) for ( int i = 0; i < attributes.getLength(); i++ ) System.out.println( spacer( indent ) + "+-[ attribute : " + attributes.getName( i ) + " ] \"" + attributes.getValue( i ) + "\"" ); }

Tree.java slide 3 // method called at the end tag of an element public void endElement( String name ) throws SAXException { indent--; } // method called when a processing instruction is found public void processingInstruction( String target, String value ) throws SAXException { System.out.println( spacer( indent ) + "+-[ proc-inst : " + target + " ] \"" + value + "\"" ); } // method called when characters are found public void characters( char buffer[], int offset, int length ) throws SAXException { if ( length > 0 ) { String temp = new String( buffer, offset, length ); System.out.println( spacer( indent ) + "+-[ text ] \"" + temp + "\"" ); } } // method called when ignorable whitespace is found public void ignorableWhitespace( char buffer[], int offset, int length ) { if ( length > 0 ) { System.out.println( spacer( indent ) + "+-[ ignorable ]" ); } }

Tree slide 4 // method called on a non-fatal (validation) error public void error( SAXParseException spe ) throws SAXParseException { // treat non-fatal errors as fatal errors throw spe; } // method called on a parsing warning public void warning( SAXParseException spe ) throws SAXParseException { System.err.println( "Warning: " + spe.getMessage() ); }

Tree.java slide 5 // main method public static void main( String args[] ) { boolean validate = false; if ( args.length != 2 ) { System.err.println( "Usage: java Tree [validate] " + "[filename]\n" ); System.err.println( "Options:" ); System.err.println( " validate [yes|no] : " + "DTD validation" ); System.exit( 1 ); } if ( args[ 0 ].equals( "yes" ) ) validate = true; SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setValidating( validate ); try { SAXParser saxParser = saxFactory.newSAXParser(); saxParser.parse( new File( args[ 1 ] ), new Tree() ); } catch ( SAXParseException spe ) { System.err.println( "Parse Error: " + spe.getMessage() ); } catch ( SAXException se ) { se.printStackTrace(); } catch ( ParserConfigurationException pce ) { pce.printStackTrace(); } catch ( IOException ioe ) { ioe.printStackTrace(); } System.exit( 0 ); }}

Day planner example continued

planner.xml Doctor's appointment Physics class at BH291C Independence Day General Meeting in room 32-A Party at Joe's Financial Meeting in room 14-C

planner.dtd

HW this section 1.Make a dtd and a conforming xml file. Make your example non-trivial but feel free to copy and modify examples given in class or your text. Check your work in the MS Validator. That means, elements should have attributes, etc. 2.You may also need to download the Xerces parser (you’ll need it at some point this semester) and install it as per the documentation that accompanies it. 3.Save tree.java to your java directory. Make sure it compiles and runs. See step 4 below. 4.For step 3, you will need to download JAXP from