Presentation is loading. Please wait.

Presentation is loading. Please wait.

.opennet Technologies An Introduction to XML

Similar presentations


Presentation on theme: ".opennet Technologies An Introduction to XML"— Presentation transcript:

1 .opennet Technologies An Introduction to XML
Fall Semester MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids Computer Science, Informatics, Physics Indiana University Bloomington IN 47404 2/18/2019 xmlintrofall01

2 Outline of Introduction to XML
The two drivers for XML A better way of specifying documents that is more powerful than HTML and easier to understand than SGML XML as a object structure for totally general entities Basic XML: well formed and valid XML Examples of use of XML XML Syntax – see details on next page Further Presentations on XML Schema XML based Document Object Model and style sheets Transforming XML documents XSLT Searching XML documents Applications of XML – Dublin Core, RDF, SVG, SOAP 2/18/2019 xmlintrofall01

3 Contents: XML Syntax XML Prolog and Processing Instructions Namespaces
What is a DTD and allowed declarations Content Models for Elements Entities: Internal, External General Character and Parameter INCLUDE and IGNORE Attribute values and types NOTATIONS Unparsed Entities Example XML Schema in overview (another presentation gives details) 2/18/2019 xmlintrofall01

4 Overview of HTML HTML = Hypertext Markup Language
the lingua franca of the World Wide Web HTML is a simple language well suited for hypertext, multimedia and the display of small and reasonably simple documents HTML 2.0 spec completed in Nov 95 HTML+ and HTML 3.0 never released HTML 3.2 (Jan 97) added tables, applets, and other capabilities (approximately 70 tags) this is what most people are familiar with today HTML 4.0 spec released in Dec 97 XHTML (XML Version of HTML 4.0) released January 2000 as a W3C recommendation 2/18/2019 xmlintrofall01

5 W3C Process The Web Consortium has a highly effective process for initiating and refining standards for the web The agreed standards for protocols and API’s are as critical to success of the web as are technologies The process to define standards involve moving from Working Draft to Last Call Working Draft to Candidate Recommendation to Proposed Recommendation and finally to Recommendation. The standards discussed here are quite recent XML Schema became a recommendation May SVG (2D Vector Graphics done in XML – relevant for Scientific visualization) became a recommendation September XQUERY (a proposed way of searching XML datastructures/documents) is currently a working draft dated June 2/18/2019 xmlintrofall01

6 Motivations for XML as a better HTML
Limitations of HTML: Extensibility: HTML does not allow users to specify their own tags or attributes in order to parameterize or otherwise semantically qualify their data. Structure: HTML does not support the specification of deep structures needed to represent database schema or object-oriented hierarchies. Validation: HTML does not support the kind of language specification that allows applications to check data for structural validity when it is imported. 2/18/2019 xmlintrofall01

7 XML in the HTML world XML = eXtensible Markup Language (name suggests documents not objects) XML is a subset of SGML -- Standard Generalized Markup Language, but unlike the latter, XML is specifically designed for the web Specification of W3C: and lots of links like XML 1.0 in February 98, with continuing refinements How XML fits into the new HTML world: XML describes the logical structure of the document. CSS (Cascading Style Sheets) or other style language describes the visual presentation of the document. The DOM (Document Object Model) allows scripting languages, such as JavaScript to access document objects. DHTML (Dynamic HTML) allows a dynamic presentation of the document. 2/18/2019 xmlintrofall01

8 Logical vs. Visual Design
The logical design of a document (content) should be separate from its visual design (presentation) Separation of logical and visual (rendering) design promotes sound typography encourages better writing is more flexible Allows the same “knowledge/information” (defined in XML) to be displaced on PC’s, PDA’s, Braille devices etc. XML can be used to define the logical design, while the XSL (Extensible Style Language) is used to define the visual design (usually by mapping XML into HTML). 2/18/2019 xmlintrofall01

9 What is SGML? SGML = Standard Generalized Markup Language defined as an ISO (not W3C) standard (ISO8879) in 1986 A SGML document carries with it a grammar called a Document Type Definition (DTD). The DTD defines the tags and the meaning of those tags DTD syntax is not very nice Presentation is governed by a style sheet written in the Document Style Semantics and Specification Language (DSSSL) Note that HTML is a fixed SGML application, a hard-wired set of about 70 tags and 50 attributes, and does not need to have a DTD for each HTML instance. 2/18/2019 xmlintrofall01

10 SGML Example A simple SGML document with embedded DTD: <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT O O (p*,BIGP*)> <!ELEMENT p - O (#PCDATA)> <!ELEMENT BIGP - O (#PCDATA)> ]> <DOCUMENT> <p>Welcome to <BIGP>XML Style! </DOCUMENT> 2/18/2019 xmlintrofall01

11 SGML Example (cont’d) A corresponding DSSSL style sheet: <!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN"> (root (make simple-page-sequence)) (element p (make paragraph)) (element BIGP (make paragraph font-size: 24pt space-before: 12pt)) DSSSL is simplified as XSL just as XML simplifies SGML 2/18/2019 xmlintrofall01

12 XML as a simple SGML XML is also an SGML application, but since XML is extensible (XML can be considered a metalanguage), every XML document must be accompanied by its DTD XML is a compromise between the non-extensible, limited capabilities of HTML and the full power and complexity of SGML XML offers “80% of the benefits of SGML for 20% of its complexity” XML designers tried to leave out all the SGML that would be rarely used on the web Note that XML specification is 30 pages and the SGML specification is 500 pages. XML allows you to define your own tags and to describe nested hierarchies of information. 2/18/2019 xmlintrofall01

13 Some Global Concepts We are defining objects – possible just for documents as in SGML We need to define “object templates” or their structure This is class for Java This is DTD for SGML and XML or Schema for XML We have instances of objects XML files or Java Objects with some way (optional for XML) of specifying We need to transform objects We can do this with “real software” i.e. read object into program, interpret and spit out in a different form We can use specialized transformation language with some control data –this is DSSSL plus stylesheet for SGML; XSLT plus stylesheet for XML; browser plus CSS stylesheet for HTML 2/18/2019 xmlintrofall01

14 XML Design Goals 1) XML shall be usable over the Internet
2) XML shall support a variety of applications 3) XML shall be compatible with SGML 4) It shall be easy to write programs that process XML documents 5) Optional features in XML shall be kept to the absolute minimum, ideally zero 6) XML documents should be human-legible and reasonably clear 7) Design of XML should be prepared quickly 8) Design of XML shall be formal and concise 9) XML documents shall be easy to create 10) Terseness in XML markup is of minimal importance 2/18/2019 xmlintrofall01

15 Features of XML I The documents are stored in plain text and thus can be transferred and processed anywhere. Inline-reusability - documents can be composed of many pieces Unifying principles make it easily acceptable “everything is a tree” UNICODE for different languages XML documents enable several types of uses traditional data processing - XML documents can be the data interchange medium document-driven programming archiving 2/18/2019 xmlintrofall01

16 Features of XML II It is important to remember that XML is a markup language, not a programming language. XSL can be viewed as a way of programming data whose structure is defined in XML M in XML is Markup reflecting its origin in the publication” community with markup specifying layout of document, fonts to use etc. XML’s most important use is not this original specifying abstract data structures -- equivalent to structures in C++ or classes in Java or Entity relationship in database world 2/18/2019 xmlintrofall01

17 Origins of XML First draft of XML spec released by W3C in Nov 96 (four other drafts published in 1997) The first XML parser (written in Java) released by Microsoft in July 97 Microsoft released version 1.8 of its XML parser (which supports XML 1.0) in Jan 98 W3C finalized the XML 1.0 spec in Feb 98 First XML-aware beta versions of Netscape and IE5.0 released in June 98 Sun announced Java Standard Extension for XML (XML API) in March 99 W3C ongoing effort as discussed 2/18/2019 xmlintrofall01

18 “Hello World!” in XML An XML document with external DTD: <?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello World!</greeting> An XML document with embedded DTD: <?xml version="1.0"? standalone =“yes” ?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello World!</greeting> 2/18/2019 xmlintrofall01

19 XML and Related Acronyms
Document Type Definition (DTD), which defines the tags and their relationships – to be replaced (IMHO) by XML Schema Extensible Style Language (XSL) style sheets, which specify the presentation of the document Cascading Style Sheets(CSS) less powerful presentation technology without tag mapping capability XPATH which specifies location in document XLINK and XPOINTER which defines link-handling details Resource Description Framework (RDF), document metadata Document Object Model (DOM), API for converting the document to a tree object in your program for processing and updating Simple API for XML (SAX), “serial access” protocol, fast-to-execute protocol for processing document on the fly XML Namespaces, for an environment of multiple sets of XML tags XHTML, a definition of HTML tags for XML documents (which are then just HTML documents) XML Schema, offers a better alternative to DTD 2/18/2019 xmlintrofall01

20 Document Type Definition
The DTD specifies the logical structure of the document; it is a formal grammar describing document syntax and semantics The DTD does not describe the physical layout of the document; this is left to the style sheets and the scripts It is no mean task to write a DTD, so most users will adopt predefined DTDs (or can write an XML document without a DTD). DTDs can be written in separate files to facilitate re-use. Content-providers, industries and other groups can collaborate to define sets of tags: the essence of “any” field (physics, music …) is captured in a domain specific DTD/Schema XML documents are valid if they are consistent with a specified DTD or Schema 2/18/2019 xmlintrofall01

21 XML must be “well-formed”
For the data contained in an XML document to be parsed correctly, its markup must be well-formed, meaning in part that properly nested and non-abbreviated starting and ending tags are used. This well-formed-ness provides a well defined encapsulation mechanism allowing designated sections of the data to be accessed programmatically. Current HTML browsers allow rule violations but XML is strict which is essential for many (robust) applications If XML was just used to render, then sloppiness allowable but as XML aimed at capturing object structure or information, we cannot have errors interpreted unpredictably by parsers Well-formed is less restrictive than valid XML documents must be well-formed – user can decide if need to be valid 2/18/2019 xmlintrofall01

22 Character Data in XML CDATA and PCDATA
XML documents are made up of markup and CDATA (character data) PCDATA is text gotten from parsing document and processing markup as necessary “markup” includes Tags, Entity references, Character references, Comments, CDATA Section delimiters, DTD declarations and Processing Instructions XML allows you to specify chunks of text which may contain “reserved characters/strings” with an ugly syntax <![CDATA <ignored>Anything </ignored> ]]> Maybe this will be replaced by alternatives based on ideas like mail attachments – see 2/18/2019 xmlintrofall01

23 Characters in XML We can choose the character set such as UTF-8 (8 bit ASCII codes for characters) or the official default Unicode (16 bit character codes as used by Java) or even UCS which offers 32 bits for each character. This is specified in the xml processing instruction in the document prolog. You can use character reference markup π is Unicode for  wrapped in &# .. ; syntax for a 16 bit (4 hexadecimal symbols) character reference in Unicode (ISO/IEC 10646) π is also  using decimal form of Unicode One can use the five built-in entity references & for & &apos; for ‘ > for > < for < " for “ We will later see how to redefine arbitrary entity references 2/18/2019 xmlintrofall01

24 White Space in XML XML as default treats spaces, tabs, line feeds and carriage return “just” as white space. Thus <greeting>Hello World!</greeting> and <greeting>Hello World!</greeting> are identical This is similar to HTML. One can overrule this using attribute xml:space with syntax <greeting xml:space=“preserve” >Hello World!</greeting> This attribute must be defined in DTD with <!ATTLIST greeting xml:space (default|preserve) ‘preserve’ > defines element greeting to allow an attribute xml:space which can take values default or preserve with latter as default If you specify xml:space, then it holds not only for given element but all those contained within it. 2/18/2019 xmlintrofall01

25 XML Example Another example which could be used for URL exchanges between network capable applications: <LINK> <TITLE>XML Recommendation</TITLE> <URL> </URL> <DESCRIPTION> The official XML spec from W3C </DESCRIPTION> </LINK> 2/18/2019 xmlintrofall01

26 XML Example (cont’d) A document may have many such links:
<?xml version="1.0" encoding=”UTF-8” standalone="yes"?> <?xml-stylesheet type=“text/css” href=“fred.css” ?> <DOCUMENT> <LINKS> <LINK>…</LINK> <LINK>…</LINK> … </LINKS> </DOCUMENT> Here we have also added prolog processing instructions. 2/18/2019 xmlintrofall01

27 XML Prolog and Processing Instructions
Every XML file starts with the prolog, giving information about the document. The minimal prolog identifies it as an xml document <?xml version=“1.0”?> The prolog may also include the encoding and whether it is a standalone document: <?xml version="1.0" encoding="ISO ” standalone="yes” ?> If it is not standalone, it may specifiy external “entities” which may be named in the document or an external DTD An XML file may also contain more general processing instructions for the application processing the document: <?target instructions ?> where target is the name of the application. Only <?xml … ?> is understood by all XML processors Specification of a stylesheet by <?xml-stylesheet .. ?> is common 2/18/2019 xmlintrofall01

28 XML Prolog and Comments
The Prolog can contain: Processing Instructions DTD Specifications -- we have illustrated these and will discuss in detail later Comments Comments have same form anywhere in the XML document and are just like comments in HTML <!--This is the Prolog and <tag> Lousy Course</tag> is not treated as a tag--> You cannot have -- inside comments but <tag> </tag> is not treated as markup 2/18/2019 xmlintrofall01

29 XML tag structure In XML terminology, a pair of start and end tags is an element. XML documents must have a strict hierarchical structure. All start tags must have an end tag. Any element must be properly nested within another. <LI> XML requires <B><I>proper nesting</I></B>.</LI> is well formed <LI> XML requires <B><I>proper nesting</I></LI>.</B> would be rejected by an XML Parser Empty tags (no content except perhaps attributes) are allowed as elements in XML documents. An empty tag is a start and end tag together and is identified by a trailing / after the tag name. So in XHTML one uses <br/> for the empty break tag. (So empty tags with no attributes are “flags”) A start tag and end tag with nothing in-between can also be considered an empty tag. <IMG SRC=“face.gif”></IMG> XML tags are case-sensitive. (<H1> is not the same as <h1>. 2/18/2019 xmlintrofall01

30 Document is a Single Tree
XML documents allow only one root element. So it must be <?xml version=“1.0” ?> <rootoftree> ……… </rootoftree> And not <?xml version=“1.0” ?> <rootoftree> ……… </rootoftree> <rootoftree> ……… </rootoftree> So there is only one tree in each document 2/18/2019 xmlintrofall01

31 XML Attributes I Tags can have any number of attributes (which must be declared inside the DTD) All attribute values must be within single or double quotes. <FONT COLOR=“#FF00CC”> quoted attribute </FONT> If you have a double quote inside an attribute value, then either Use " for inside quote as in quote=“"” Enclose attribute value in single quotes as in quote=‘”’ Each attribute can only appear once in a given element definition One can choose (matter of taste) between <person name=“Fox” role=“teacher” ></person> and <person><name>Fox</name><role>teacher</role></person> Note you can repeat elements but you cannot repeat attributes to represent multiple occurrences 2/18/2019 xmlintrofall01

32 XML Attributes II Note that with DTD (this changes with Schema), all element and attribute values are text not numbers and so must be “converted” by application to intended form So <item> weekdays<quantity>5</quantity><item> or <item quantity=“5” >weekdays</item> Returns string “5” not the number 5 for quantity xml:lang is a useful attribute (in xml Namespace) which can be used (as always if declared in DTD or Schema as allowed attribute) to specify language <text xml:lang=“en”>Good English</text> <text xml:lang=“x-youth” >Coolio,Wax On, Wax Off, Dude</text> xml:lang can take values from an official vocabulary (such as en above which is ISO 639) or your private code starting with x- 2/18/2019 xmlintrofall01

33 XML Names and NMTOKEN Name Characters are letters, digits, hyphens, underscores, colons or full stops. An NMTOKEN is any collection of Name Characters NMTOKENS is any list of NMTOKEN’s separated by white space (space, tab, newline etc.) Case is significant: PERSON and person are distinct names Attribute and Element names must be (a subset of) NMTOKEN with restriction Names cannot begin with a digit Names cannot begin with xml (or any variant gotten by case changes) – system will use this prefix Colons are ONLY to be used in Namespaces – currently an informal rule only 2/18/2019 xmlintrofall01

34 CDATA Sections CDATA sections allow you to include unparsed characters in a document <![CDATA <ignored>Anything </ignored> ]]> In this example the ignored tag is not processed by XML parser Unfortunately you must guarantee that there is no ]]> string in the text between <![CDATA and ]]> <script language=“JavaScript”> <![CDATA var fred = 0; if( fred < 10) { document.writeln(“> and < here are NOT parsed”); } ]]> </script> 2/18/2019 xmlintrofall01

35 XML Namespaces I This is an extension to XML adopted January 1999 at Namespaces address problem that attributes cannot be repeated; more fundamentally it provides subroutine or library capability to XML Suppose you had a DTD with <student> and <faculty> and you wanted to write <student><name>you</name><student> <teacher><name>me<special>Prof</special></name></teacher> This is invalid unless <name> is identical in structure for both teacher and student, as each element in tree must have unique structure. We can get round it by using <studentname> and <teachername> but this is not so satisfactory especially if you get this conflict by joining two different sets of tags together This is seen in XHTML when you could add MathML SMIL SVG tags …. 2/18/2019 xmlintrofall01

36 XML Namespaces II So we use new syntax xmlns= to define an XML Namespace The value of xmlns is hopefully a useful URL/URI telling you about tags. However this is not required. Microsoft in its cunning way uses in Office web export: <xml xmlns:v="urn:schemas-microsoft-com:vml“ xmlns:o="urn:schemas-microsoft-com:office:office“ xmlns:p="urn:schemas-microsoft-com:office:powerpoint"> And teaches Internet Explorer to understand these obscure “universal resource names” for VML Office and PowerPoint Namespaces respectively. xmlns is an attribute which can be used in any element (depending on parser you may need to declare this as allowed attribute in DTD) <student xmlns=“studentdtd”><name> …. 2/18/2019 xmlintrofall01

37 XML Namespaces III And when we come to teacher use <bigboss:teacher xmlns:bigboss=“teacherdtd”><bigboss:name> …. In the above, we made student elements as default We can more symmetrically write <university xmlns:bigboss=“teacherdtd” xmlns:downtrodden=“studentdtd” > <downtrodden:student><downtrodden:name>you </downtrodden:name></downtrodden:student> …….. <bigboss:teacher><bigboss:name>me </bigboss:name></bigboss:student> </university> 2/18/2019 xmlintrofall01

38 Document Type Definition
A powerful feature of XML that provides a formal set of rules to define a document structure Defines the elements that may be used, and dictates where they may be applied in relation to each other; therefore specifies the document hierarchy and granularity Comprises a set of declarations that define a document structure tree Declarations stored either at the top of each document that must conform to the rules, or alternatively, and more usually, in separate data files, referred by a special instruction at the top of each document. Although formally optional, it is required by many XML tools Schema are in many ways more elegant but DTD will teach us syntax of XML! 2/18/2019 xmlintrofall01

39 Document Type Definition
Each DTD element must either be a container element, or be empty (a place holder). Container elements may contain text, child elements, or a mixture of both. DTD also specifies the names of attributes, and dictates which elements they may appear in. For each attribute it specifies whether it is optional or required. It gives list of possible values for an enumerated attribute Comparing XML with Java, a DTD corresponds to the class and an XML file to an object – an instance of a class Files that obey XML syntax rules are well formed Files consistent with DTD are valid One can “punt” and specify ANY for document structure This implies file can have any elements and tags and there is no validation needed but all elements still need to be declared (see later example) 2/18/2019 xmlintrofall01

40 DTD definitions A DTD allows you to create new tags by writing grammar rules which the tags must obey. The rules specify which tags and attributes are valid and their context. You specify order and number of times each element can appear A DTD element declaration looks like: <!ELEMENT person(name, *)> ELEMENT is the type person is the element declaration (name, *) is the element content model name and are the children of person and define the hierarchy of the document. must follow name in file Note that this is called a grammar rule because it could have been written in BNF: person ::= (name, *) 2/18/2019 xmlintrofall01

41 Document Type Definition I
A DTD consists of a set of declarations Each declaration must use markup format <!…>, and can only use the one of the following keywords: ELEMENT (tag definition) ATTLIST (attribute definitions) ENTITY (entity definition) NOTATION (data type notation definition) COMMENT (Same format as already described) The declarations should appear inside a <!DOCTYPE document declaration which is at its simplest <!DOCTYPE Rootname [ DTD Declarations starting with one for element Rootname ]> 2/18/2019 xmlintrofall01

42 Document Type Definition II
Notice that the first element declared in a DTD must be the same Rootname which is first argument of DOCTYPE declaration There are several types of DTD: Internal: <!DOCTYPE Rootname [DTD Declarations]> External in one of two forms: <!DOCTYPE Rootname SYSTEM URL ]> <!DOCTYPE Rootname PUBLIC Identifier URL ]> And Mixed: <!DOCTYPE Rootname SYSTEM URL [DTD Local Declarations]> <!DOCTYPE Rootname PUBLIC Identifier URL [DTD Local Declarations]> Normally one uses SYSTEM type external DTD 2/18/2019 xmlintrofall01

43 XML <LINKS> Example of External DTD File
The URL can be a file “something.dtd” in same directory as XML file which is a typical relative address or a full URL such as “ Here is a DTD for the earlier example of a tree DOCUMENT with ability to define <LINKS>: <!ELEMENT DOCUMENT (LINKS)> <!– Any Comment with usual syntax --> <!ELEMENT LINKS (LINK)*> <!ELEMENT LINK (TITLE,URL,DESCRIPTION)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT URL (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)> PCDATA stands for “parsed character data” Note external file starts with <!ELEMENT declaration and does not not have <!DOCTYPE declaration 2/18/2019 xmlintrofall01

44 XML Example using LINKS DTD
Now store this DTD in a file (links.dtd) and write an XML document based on this DTD as follows: <?XML version="1.0"?> <!DOCTYPE DOCUMENT SYSTEM "links.dtd"> <DOCUMENT> <LINKS> <LINK>…</LINK> <LINK>…</LINK> … </LINKS> </DOCUMENT> This is an instance (object) based on the class defined by DOCUMENT DTD Instance and “class definition” (links,dtd) are stored in same directory 2/18/2019 xmlintrofall01

45 Document Type Definition Summary
Declarations are grouped within a DTD and can be fully contained in file as below <!DOCTYPE Rootname [ <!--The DTD for tree Rootname appears here e.g. --> <!ELEMENT person (name, *, link?) > ………. ]> One can store a DTD in a separate file with syntax <!DOCTYPE Rootname SYSTEM URL > where URL is an absolute or relative location. Examples are: <!DOCTYPE Rootname SYSTEM “EXTRNL.DTD” > <!DOCTYPE Rootname SYSTEM “ > In mixed format <!DOCTYPE MYDTD SYSTEM “EXTRNL.DTD” [ <!-- Some of MYDTD appears here augmenting or modifying declarations in external file --> <!ELEMENT person (name, *, link?) > ]> formally you can modify a declaration in external file (i.e. internal declaration takes precedence) but not all XML parsers allow this 2/18/2019 xmlintrofall01

46 Examples of Official DOCTYPE Declarations
W3C asks you to use for XHTML: <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “ > They say that documents using the MathML DTD should contain a doctype declaration of the form: <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" " > The URL(URI) may be changed to that of a local copy of the DTD if required. So an alternative is: <!DOCTYPE math SYSTEM "mathml2.dtd" > If a namespace prefix is being used, so that for example the document element is: <mml:math xmlns:mml=" ... </mml:math> then the prefix must be declared in the local subset of the DTD, as follows: <!DOCTYPE mml:math PUBLIC "-//W3C//DTD MathML 2.0//EN" " [ <!ENTITY % MATHML.prefixed "INCLUDE"> <!ENTITY % MATHML.prefix "mml"> ]> 2/18/2019 xmlintrofall01

47 Public Identifier in DOCTYPE
If you use a PUBLIC keyword in the DTD then this is followed by FPI or formal public identifier of form standard//group//type//language standard is – if you are defining it; + if approved by a nonstandards body; and name of standard if it exists; group is name of person or group that invented DTD e.g. Geoffrey Fox or W3C type represents name of DTD including a version number language is 2 character language abbreviation This is exemplified on previous foil by PUBLIC "-//W3C//DTD MathML 2.0//EN" 2/18/2019 xmlintrofall01

48 Element Declarations: EMPTY
Keyword ELEMENT Introduces a new element <!ELEMENT NAME CONTENT_MODEL> Element name must begin with a letter, and may additionally contain digits and some punctuations, i.e. ‘.’, ‘-’, ‘_’, and ‘:’ as we described earlier under NMTOKEN If an element can hold no child elements, and also no text, then it is known as empty element and denoted by EMPTY for CONTENT_MODEL This seems trivial but it isn’t because the present or absence of this element in an XML file can be used as a flag As an example we can find several in HTML such as HR and IMG which never have children and include no text. Here we would write <!ELEMENT HR EMPTY> and then <HR/> or <HR></HR> generates a horizontal line EMPTY ELEMENTS can have attributes such as the SRC attribute in <IMG/> to specify source of image. 2/18/2019 xmlintrofall01

49 Element Declarations: ANY
An element declared to have a content of ANY may contain all of the other elements declared in the DTD This is not quite the same as no DTD for the file <!DOCTYPE fred [ <!ELEMENT fred ANY > ]> <fred> <people>Me and You</people> <people>Them</people> </fred> Gets an error due to presence of <people> tag Adding <!ELEMENT people ANY > inside DTD declaration produces a valid document. Go to and paste files into their textbox to see this 2/18/2019 xmlintrofall01

50 Element Declaration Content Model I
<!ELEMENT elementname Content_Model > The Content_Model is either a collection of chlid elements or parsed character data or a mixture A Content_Model is bounded by brackets, and contains at least one token. When a Content_Model contains more than one content token, the child elements are controlled using two logical connector operators; sequence connector ‘,’, and choice connector ‘|’ <!ELEMENT element1 (a, b, c)> indicates a is followed by element b, which in turn is followed by c. <!ELEMENT element2 (a | b | c)> indicates either one can be selected. Combinations are possible: (a,b,(c|d)), or ((a,b,c) | d) 2/18/2019 xmlintrofall01

51 Element Declaration Content Model II
Quantity indicators can also be used. ‘?’ indicates an element is optional or cannot repeat ‘+’ indicates an element is required and may repeat ‘*’ indicates an element is optional, and also repeatable (default) indicates element must be present once Document text is indicated by the keyword #PCDATA (Parsable Character Data) <!ELEMENT emph (#PCDATA|sub|super)*> <!ELEMENT sub (#PCDATA)> <!ELEMENT super (#PCDATA)> <emph>H<sub>2</sub>0 is water.</emph> Note if no quantity indicated, element MUST appear and if , sequence indicator used, one must preserve order 2/18/2019 xmlintrofall01

52 Element Declaration Content Model III
Cleanest is only use either #PCDATA on its own or to use general specification of multiple child elements. Mixed Content_Models cannot specify limits on occurrences. For instance: <!DOCTYPE fred [ <!ELEMENT fred (people)+ > <!ELEMENT people (#PCDATA | name)+ > <!ELEMENT name (#PCDATA) > ]> <fred> <people>Me and You<name>Fox</name></people> <people><name>Bryan</name>Them</people> </fred> Is Illegal. I must use <!ELEMENT people (#PCDATA | name)* > but then I have no constraints on number of occurrences of #PCDATA strings or of <name> children 2/18/2019 xmlintrofall01

53 Element Declaration Content Model IV
<!DOCTYPE fred [ <!ELEMENT fred (people)+ > <!ELEMENT people (comment | name)+ > <!ELEMENT comment (#PCDATA) > <!ELEMENT name (#PCDATA) > ]> <fred> <people><comment>Me and You</comment><name>Fox</name></people> <people><name>Bryan</name><comment>Them</comment> </people> </fred> Is valid and more precisely you can replace 3rd line by: <!ELEMENT people ((comment | name),(name|comment)) > and require one name and one comment in any order 2/18/2019 xmlintrofall01

54 Element Declaration Content Model V
Either DTD will give an error if I add <people></people> before </fred> as people has + in content model (comment | name)+ In <!ELEMENT elementname contentmodel > contentmodel can also be EMPTY (elements presence is a flag like <br> or <hr> in HTML) Then use <elementname attributes /> Or contentmodel can be ANY to indicate no constraints 2/18/2019 xmlintrofall01

55 DTD’s and Namespaces I DTD’s and Namespaces are a little confusing as it is not clear how much about namespaces are understood by parser. Best is to use something like <university xmlns:bigboss=“teacherdtd” xmlns:downtrodden=“studentdtd” > with teacherdtd, studentdtd as conventional DTD’s without any special prefixes However this will lead to errors for parsers that do not understand Namespaces. In this case you need you make explicit the namespace: prefixes in DTD and allow attribute 2/18/2019 xmlintrofall01

56 DTD’s and Namespaces II
<!DOCTYPE jim:fred [ <!ELEMENT jim:fred (jim:people)* > <!ATTLIST jim:fred xmlns:jim CDATA #FIXED “ > <!ELEMENT jim:people (jim:comment | jim:name)* > <!ELEMENT jim:comment (#PCDATA) > <!ELEMENT jim:name (#PCDATA) > ]> <jim:fred xmlns:jim=“ > </jim:fred> Is an example of a an always valid use of Namespace prefixes 2/18/2019 xmlintrofall01

57 Entities I The DTD of an XML document can contain entity declarations. These are like macro substitutions in other languages. ENTITY’s are defined in DTD and consist of several flavors: General Entities are referenced as &EntName; Parameter Entities are referenced as %Entname; We have already seen the character entities & for & &apos; for ‘ > for > < for < " for “ These are built in but you could add other such entities with <!ENTITY aitself “A” > and &aitself; would be replaced by A 2/18/2019 xmlintrofall01

58 General Entities II As another example, I can use in DTD <!ENTITY TODAY “12 September 2001” > and <comment>&TODAY; was very quiet in Indiana</comment> is parsed as <comment>12 September 2001 was very quiet in Indiana</comment> General Entity references can be nested inside a DTD e.g. one can write <!ENTITY YEAR “2001” > <!ENTITY TODAY “12 September &YEAR;” > However one must use Parameter Entities and not General Entities for macro substitution in other DTD declarations like <!ATTLIST and <!ELEMENT Parameter entities are defined as in <!ENTITY % CUSTARDTAGS “(NAME,DATE,ORDERS)” > And we give examples later 2/18/2019 xmlintrofall01

59 Nested Entity Example An entity declaration specifies replacement text for the entity including some macro-preprocessing capability. <!ENTITY pub “Éditions Gallimard”> <!ENTITY rights “All rights reserved”> <!ENTITY book “La Peste: Albert Camus, &#sA9; &pub; . &rights;”> This entity would have replacement text for book: La Peste: Albert Camus, c 1947 Editions Gallimard. All rights reserved where c would be copyright symbol, and E has accent mark. 2/18/2019 xmlintrofall01

60 Parameter Entity Example
<!ENTITY % peopletags “(firstname,lastname,dateofbirth)” > <!ELEMENT student %peopletags; > <!ELEMENT teacher %peopletags; > <!ELEMENT administrator %peopletags; > Defines a bunch of ELEMENTS that are people to have the same children elements Parameter entities are even more commonly used for attributes because almost always several ELEMENTS share the same attributes (with often a basic set being augmented in different ways for different ELEMENTS) This basic set can be set in a parameter Entity 2/18/2019 xmlintrofall01

61 Attributes The rules for attribute declarations follow a similar structure to elements and have the following example. <!ENTITY % peopletags “(firstname,lastname,dateofbirth)” > <!ELEMENT person %peopletags; > <!ATTLIST person gender (male|female) #IMPLIED > ATTLIST is the declaration type person is the element name gender is the attribute name (male|female) #IMPLIED is the attribute definition In general syntax is <!ATTLIST ELEMENT_NAME ATTRIBUTE_NAME1 TYPE1 DEFAULT_VALUE1 ATTRIBUTE_NAME2 TYPE2 DEFAULT_VALUE2 ……………. > We now describe the last two fields 2/18/2019 xmlintrofall01

62 Attribute DEFAULT_VALUEs
The DEFAULT_VALUE keywords following an attribute type can be #IMPLIED attribute is optional and there is no default value “…………” string in quotes is the default value for attribute #REQUIRED attribute is required but there is no default value #FIXED “….” attribute is assigned a fixed value which follows #FIXED keyword. If attribute is NOT set in XML file, it is generated automatically at fixed value If attribute is set in XML file, parser generates an error unless value set is equal to “fixed value” 2/18/2019 xmlintrofall01

63 Attribute Types I CDATA type is character data - may include markup <!ATTLIST form method CDATA #FIXED ‘POST’> Enumerated type is a list of possible values – each of which must be a legal XML name <!ATTLIST form method (GET | POST) ‘POST’ > Note that in enumeration one does NOT need quotes surrounding characters but these are needed when specifying default value Note the type is the list and not keyword ENUMERATED Less important types are NMTOKEN or NMTOKENS which restrict CDATA types to be strings that only contain XML name characters (or white space separated set of such XML name strings for NMTOKENS) <!ATTLIST form method NMTOKEN ‘POST’ possiblemethods NMTOKENS ‘POST GET’ > 2/18/2019 xmlintrofall01

64 Attribute Types II Any ELEMENT is allowed at most one attribute of type ID and within any document all values of such attributes must be distinct An ID must be a valid XML name and so canNOT begin with a number. However ID=“X123456” is allowed <ATTLIST CUSTOMER CUSTOMER_ID ID #REQUIRED> …….. And in the XML file one uses <CUSTOMER CUSTOMER_ID=“X123456”>Sucker</CUSTOMER> An Attribute of type IDREF is required to have a value that matches an ID attribute within the same document <ATTLIST BUGS CUSTOMER_SOURCE IDREF #REQUIRED> And in the XML file one uses <BUGS CUSTOMER_SOURCE=“X123456”>PC Caught Fire</BUGS> Often ID and IDREF are mumbo jumbo as generated by a control software 2/18/2019 xmlintrofall01

65 XML Example - the DTD Create a DTD file for an address book named “ab.dtd” <!ELEMENT addressBook (person)+> <!ELEMENT person (name, *, link?) > <!ATTLIST person id ID #REQUIRED > <!ATTLIST person gender (male|female) #IMPLIED > <!ELEMENT name (#PCDATA|(family,given))> <!ELEMENT family (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT (#PCDATA)> <!ELEMENT link EMPTY > <!ATTLIST link manager IDREF #IMPLIED subordinates IDREF #IMPLIED > 2/18/2019 xmlintrofall01

66 XML Example - the XML document
<?xml version="1.0"?> <!DOCTYPE addressBook SYSTEM ”ab.dtd"> <addressBook> <person id=“B.WALLACE” gender=“male”> <name> <family>Wallace</family> <given>Bob</given> </name> <link> manager=“C.TUTTLE”/> </person> <person id=“C.TUTTLE” gender=“f ”> <name> <family>Tuttle</family> <given>Claire </given </name> <link subordinates=“B.WALLACE”/> </person> </addressBook> ID IDREF Empty Element 2/18/2019 xmlintrofall01

67 Homework 2 12 September 2001 Read Chapters 3 and 4 of Inside XML (or equivalent discussion of DTD) Go to and download 30 day trial version of XML Spy Take DTD for users and courses discussed in class and combine it with your first assignment Namely extend Course ELEMENT by a new CourseMaterials ELEMENT and give structure to this so that it can be used to specify material (such as library materials) needed to support a course Make an example Course XML Instance which has entries for all (required) ELEMENTS/ATTRIBUTES including your new CourseMaterials Element 2/18/2019 xmlintrofall01

68 Use of INCLUDE and IGNORE
One can write in a DTD <![ INCLUDE [ Normal DTD Declarations ]!> or <![ IGNORE [ Normal DTD Declarations ]!> or <!ENTITY % ignorer “IGNORE” > …………………. <![ %ignorer; [ Normal DTD Declarations ]!> This technique allows one to divide DTD into modules and select those to be included with a set of Parameter entity statements 2/18/2019 xmlintrofall01

69 General External Entities I
These allow to insert not just text but complete files. The simplest syntax is <!ENTITY Entityname SYSTEM URL > with for example <?xml version=“1.0” standalone=“no” ?> ……… <!ENTITY TODAY SYSTEM “date.txt” > and date.txt just contains 12 September 2001 You can even put an entire document in a file contents.xml and write <?xml version=“1.0” standalone=“no” ?> <!DOCTYPE treename [ …. <!ENTITY REALSTUFF SYSTEM “contents.xml” > …. ]> <treename> &REALSTUFF; </treename> 2/18/2019 xmlintrofall01

70 More on Entities One can also use the syntax we introduced for DOCTYPE. Namely <!ENTITY Entityname PUBLIC FPI URL > Here the Formal Public Identifier FPI has the four fields described earlier Finally we describe parameter entities which can be used for the real meat of a DTD. These are just as before except there is an extra % in definition and they are referenced as %Entityname; <!ENTITY % Entityname Definition > <!ENTITY % Entityname SYSTEM URL > <!ENTITY % Entityname PUBLIC FPI URL > These are very useful if you have multiple ELEMENTS with related specifications 2/18/2019 xmlintrofall01

71 Attributes of Type ENTITY
One can define an attribute to have type ENTITY when its value must be the name of an ENTITY defined in the DTD <!ENTITY image1 SYSTEM “beauty.gif” > <!ENTITY image2 SYSTEM “beast.jpeg” > <ATTLIST views picture ENTITY #REQUIRED allowedpictures ENTITIES ‘image1 image2’ > …….. <views picture=‘image2’ allowedpictures=‘image2’>Out of Focus</views> Attribute type of ENTITIES is a list of white space separated ENTITY names The final attribute type is NOTATION but we must first define the NOTATION declaration 2/18/2019 xmlintrofall01

72 Notations and use in Attributes
Notations specify the format of non XML data and have syntax <!NOTATION Name SYSTEM “External_ID” > or <!NOTATION Name PUBLIC FPI “External_ID” > Where External_ID is something like a MIME Type with an example DTD fragment: <!NOTATION GIF SYSTEM “image/gif” > <!NOTATION JPEG SYSTEM “image/jpeg” > <!ATTLIST STUDENT imageurl CDATA #REQUIRED image_type NOTATION (GIF|JPEG) #IMPLIED> And this is used in a XML file with syntax like <STUDENT imageurl=“postcard.gif” image_type=“GIF” > 2/18/2019 xmlintrofall01

73 Unparsed Entity Declarations
These are specified as <!ENTITY NAME SYSTEM VALUE NDATA TYPE> or <!ENTITY NAME PUBLIC FPI VALUE NDATA TYPE> Where NAME is name to be given to an unparsed external entity SYSTEM or PUBLIC FPI have the roles described earlier for external entities VALUE is value of entity – such as a external file URL NDATA signifies unparsed TYPE is any declared NOTATION A typical example would be in DTD <!NOTATION GIF SYSTEM “image/gif” > <!ENTITY IMAGE1 SYSTEM “image.gif” NDATA GIF > <!ATTLIST STUDENT IMAGE ENTITY #IMPLIED> ]> …… And used in XML file as <STUDENT IMAGE=“IMAGE1” > This is a (rather clumsy) way of including “binary” (non XML) format data into a document 2/18/2019 xmlintrofall01


Download ppt ".opennet Technologies An Introduction to XML"

Similar presentations


Ads by Google