Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Validation I DTDs Robin Burke ECT 360 Winter 2004.

Similar presentations

Presentation on theme: "XML Validation I DTDs Robin Burke ECT 360 Winter 2004."— Presentation transcript:

1 XML Validation I DTDs Robin Burke ECT 360 Winter 2004

2 Outline History Grammars / Regular expressions DTDs elements attributes entities Declarations

3 Validation Why bother?

4 The idea Language consists of terminals a, b, c Set of productions beginning with non-terminals A, B, C rules specifying how to generate sequences of terminals

5 Example A  aB A  aBA B  b generates strings ababab etc.

6 Grammar Can be used to efficiently parse a language basis of all modern programming language parsing since Algol-60 Java Language Specification is completely in EBNF grammar

7 Grammar XML grammar-based syntax adheres to EBNF SGML SGML had a more complex language definition syntax HTML is defined the SGML way

8 Regular expressions Language for expressing patterns Basic components pattern elements optional element = ? repetition (1 or more) = + repetition (0 or more) = * choice = | grouping = ( ) sequence =,

9 Examples (a, b)* all strings "ab" "abab" etc. (a | b | c)+, q, (b, c)* aaqb bq bqcccccccc

10 Note Regular expressions are different in different applications Perl Javascript XML Schemas DTDs only support ?+*|,()

11 EBNF EBNF is more compact version of BNF it uses regular expressions to simplify grammar expression A  aB A  aBA turns into A  aB(A)? only one production per non-terminal allowed

12 DTDs Use EBNF to specify structure of XML documents Plus attributes entities Syntax holdover from SGML Ugly

13 DTD Syntax Content model contains the RHS of the production rule Example <!ELEMENT name (firstName, lastName)>

14 DTD Syntax cont'd Not XML <! begins a declaration No "content" Empty elements not indicated with />

15 Simple content models Content can be any text #PCDATA Content can be anything at all (useful for debugging) ANY Element has no content EMPTY

16 Example Jane Doe A John Doe A-

17 Example Jane Doe A John Doe A- Wayne Doe I Alien abduction

18 Mixed content Legal to have a content model with text and element data President Meets with Congress <![CDATA[ The President meet with Congressional leaders today in effort to jump-start faltering budget negotiations. Sources described the mood of the meeting as "cordial". ]]>

19 CDATA? Forgot to mention last week Content that appears here will not be parsed Can include arbitrary text including <, &, etc. Only restriction termination sequence ]]>

20 Mixed content, cont'd Mixed content makes handling XML complex necessary for many applications

21 Recursion Unlike grammars recursive formulation ≠ repetition Difference between

22 Restriction The grammar cannot be ambiguous A  (a, b)| (a, c) this makes the parser implementation difficult Usually easy to make non-ambiguous A  a, (b | c)

23 Attribute lists Declared separately from elements can be anywhere in the DTD Specification includes name of the element name of the attribute attribute type default

24 Attribute types Character data CDATA different from XML CDATA section! Enumerated (yes|no) ID must be unique in the document IDREF must refer to an id in the document NMTOKEN a restriction of CDATA to single "word" Also IDREFS and NMTOKENS

25 Default declaration #REQUIRED #IMPLIED means optional Value this becomes the default #FIXED value provided

26 Examples <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED align (left|right|center) "left" id ID #IMPLIED > <!ATTLIST timestamp time-zone NMTOKEN #IMPLIED>

27 Entities Like macros content to be inserted indicated with &name; Predefined general entities & < essential part of XML User-defined general entities &disclaimer;

28 Entities, cont'd Parameter entities can also be used to simplify DTD creation or to combine DTDs indicated with a % More on this next week

29 Defining general entities Example <!ENTITY disclaimer "This is a work of fiction. Any resemblance to persons living or dead is unintentional.">

30 Unparsed data What about non-text data? images, audio files In XML we define a notation create a name and associate an application suggestion to the application how to interpret the unparsed data not part of parsing operation

31 Using Notation Example declares the jpeg notation Example

32 Notation, cont'd Note that the content is defined in the DTD not the document binary data embedded in XML document Not that useful in practice more likely to use URLs

33 Typical Example... Now it is up to the application to do something appropriate with the src attribute

34 A better solution Use XLink We'll talk about this later

35 DTD limitations Not in XML need a special parser for the DTD No content type restrictions #PCDATA can be anything Element names must be globally unique cannot reuse a common term at different places in the document course-name professor-name

36 DTD benefits Relatively easy to write and understand wait until you see XML Schema! Possible to modularize and combine DTDs more next week

37 Next week More DTDs Modularization and parameterization on-line reading Beginning Schemas 4.1-4.30

38 Lab

Download ppt "XML Validation I DTDs Robin Burke ECT 360 Winter 2004."

Similar presentations

Ads by Google