Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML.

Similar presentations


Presentation on theme: "XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML."— Presentation transcript:

1 XML An introduction

2 xml XML like HTML is created from the Standard Generalized Markup Language, SGML

3 A brief introduction to XML: A simple xml doc Welcome to XML!

4 In validator: file is in examples\ch05\intro.xml

5 XML documents and format An XML document contains data, not formatting information. As we’ll learn, there are ways (xsl and fo files, for example) to provide formatting for xml analogous to that in which css provided formatting for html.

6 XML XML are typically stored in a file with suffix.xml, though this is not required. They can be created with any editor (save as ASCII text). Many packages like MS Word can save files as type.xml An xml document contains a single root which contains other elements, Anything appearing before the root is called the prolog. Elements directly under the root are its children. The structure is recursive. In the example, the root’s child message contains the text “Here is some message”.

7 The character set XML characters are CR, LF and Unicode. An XML document consists of markup and character data. Markup is enclosed in angle brackets (like html): <> Character data appears between the start and end tag. An xml parser passes whitespace characters to the application. Insignificant whitespace can be collapsed in a process called normalization. It is a good idea to add whitespace to an xml document for readability. &,, ‘ and “ are reserved characters. An “entity reference” makes it possible to use these as characters in the character data part of an xml document. Entity references begin with & and end with ; In this way character data is not confused with markup. Single and double quote are used to delimit attribute values.

8 More on syntax There must be exactly one root. Proper nesting of elements is required. Start tags require close tags. Unlike HTML, the author can define her own tags in XML. Tags are case sensitive Parser needs to distinguish markup from character data Typically, whitespace is normalized – reduced to 1 whitespace char. Entity references are marked with an ampersand and allow us to use meta characters (‘ ’ and so on) which are part of the language syntax. Entity references (for example, “&lt”) allow us to represent and distinguish the reserved characters,& in XML. They may only appear as an entity reference in character data

9 XML intro continued A DOM-based parser returns a tree structure. A DOM parser must process the entire document to create a (java) object which may be 3 or 4X the size of the original. Not advisable if there are storage size constraints. A SAX (Simple-API for XML) -based parser returns events. SAX parsers have a smaller footprint. Many parsers can be downloaded for free and several come with java 1.4+

10 A brief introduction to XML An xml validator parses an XML document and indicates if it is correct. A number of free “Validators” are available, including one from MS which I downloaded and used in this ppt.

11 Validator Microsoft provides a validating program free for download (with javascript and VBscript versions) at http://msdn.microsoft.com/archive/default.asp?url=/archive/en- us/samples/internet/xml/xml_validator/default.asp Or search MSDN+validator There are others out there: http://validator.w3.org/ http://www.stg.brown.edu/service/xmlvalid/ http://www.w3schools.com/XML/xml_validator.asp

12 Link to validator program on my w drive http://employees.oneonta.edu/higgindm/internet %20programming/validate_js.htmhttp://employees.oneonta.edu/higgindm/internet %20programming/validate_js.htm This is a link for javascript validator http://employees.oneonta.edu/higgindm/internet %20programming/validate_vbs.htmhttp://employees.oneonta.edu/higgindm/internet %20programming/validate_vbs.htm This is a link for vbscript validator

13 MS Validator: http://employees.oneonta.edu/higgindm/internet%2 0programming/validate_js.htm http://employees.oneonta.edu/higgindm/internet%2 0programming/validate_js.htm

14 Parser continued The parser will indicate if the document is well-formed. In DOM-based parsing, a ‘+’ in the left margin indicates a node has children and a’ –’ indicates all child nodes have been expanded. The MS Validator uses color coding to indicate child nodes can be expanded An element that stores other elements is called a container element. The parser makes the document content available for further processing if it is well-formed.

15 Validator example

16 Validator

17 Reserved characters <>& would enable a character data message to contain characters: <>&

18 DTD: document type definition a dtd file may contain the definition of an xml structure. XML files may refer back to a dtd. If an XML document has a DTD or Schema, a validating parser can determine not merely if it is well-formed XML, but whether it is valid. Valid means conforming to a dtd or schema.

19 Another example: Unicode Lang.xml (next slide) uses unicode entity references to represent arabic words. lang.dtd (also shown in a later slide) is used to generate unicode characters (arabic) for some entity references in the XML file.

20 DTD: document type definition: a dtd file may contain the definition of an xml structure. دايتَل أند &assoc; أهلاً بكم فيِ عالم &text;

21 Lang.dtd

22 Lang.xml in validator

23 Lang.xml in IE

24 About the example The DTD reference contains: DOCTYPE, the name of the root, the SYSTEM flag indicating the DTD file is external, and the name of that file. Root element welcome contains two elements: from and subject. Some lines contain entity references for unicode. The DTD also defines some other entity references.

25 More about markup XML end tags may consist of /> if there is an empty element as in but otherwise must consist of a complete end-tag as in: xxxxxxxxxxx Elements may or may not have content (child elements or character data) Elements may have 0 or more attributes associated with them. Attributes appear in the element’s start tag: Attribute values must appear in single or double quotes. Element and attribute names may not contain blanks. Here, element car has attribute doors with value 4. Attributes may contain any characters and be of any length but must start with a letter or underscore.

26 Usage.xml uses a stylesheet Deitel&apos;s XML Primer Paul Deitel Welcome Easy XML XML Elements? Entities

27 Usage.xls In notes in usage.xml represents a pi (that is, a processing instruction). PI consist of a PI target (xml:stylesheet, in this example) and a PI value. Note syntax. PI can be used to help authors embed application- specific data in an xml document. If the application processing the xml doesn’t use the PI, then it has no effect on the xml document content.

28 Usage.xml in validator

29 Usage.XML document loaded into IE: Browser uses stylesheet to generate HTML

30 CData The character data appearing in CData sections is ignored by the xml parser. CData might be used for JavaScript or VBScript. CData starts with CData may contain reserved characters, but not the text: “]]>”

31 Text example 5.7 // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); <![CDATA[ // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr displayError(); ]]> C++ How to Program by Deitel & Deitel

32 CData example from text 5.7

33 Cdata.xml in MS validator (file is in examples\ch05)

34 letter.xml - I removed blank lines to get it to fit here Jane Doe Box 12345 15 Any Ave. Othertown Otherstate 67890 555-4321 John Doe 123 Main St. Anytown Anystate 12345 555-1234 Dear Sir: It is our privilege to inform you about our new database managed with XML. This new system allows you to reduce the load on your inventory list server by having the client machine perform the work of sorting and filtering the data. The data in an XML element is normalized, so plain-text diagrams such as /---\ | | \---/ will become gibberish. Sincerely Ms. Doe

35 letter.xml in Validator

36 namespaces Naming collisions can occur when xml authors use the same tag names Namespaces provide a mechanism for making tag references unambiguous. A namespace reference appears with the start and end tags followed by a colon. So, Scrooge can be differentiated from colon Namespace prefixes are tied to unique URI in the xml document. Almost any name can be used to create a namespace prefix. In this example ascii and movie are namespace prefixes. Namespace prefixes can precede element and attribute values to avoid collisions. A URL may be used for a URI. The only requirement though is uniqueness as the URLs are not visited by the parser.

37 Namespace example 5.8 <text:directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> A book list A funny picture

38 Namespace.xml in validator: file is in examples\ch05

39 Namespace.xml example 5.8 in IE

40 Namespaces continued Providing a prefix can be tedious. A default namespace can be created and elements and attributes used in the xml document from this namespace do not need prefixes.

41 Default namespaces <directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> A book list A funny picture

42 Default namespaces Now, file is in the default namespace. Compare this example to the earlier namespace example where text and image were distinct namespaces.

43 Defaultnamespace.xml in IE

44 Day planner case study…to be continued… Doctor&apos;s appointment Physics class at BH291C Independence Day General Meeting in room 32-A Party at Joe&apos;s Financial Meeting in room 14-C

45 Planner.xml in validator

46 day planner using a java GUI. SAX parser is used to parse the document. (in text chapter 8)

47 Homework on this section Install an xml validator Create your own xml file and validate it. Post screenshots of your XML file and what validator.


Download ppt "XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML."

Similar presentations


Ads by Google