Download presentation
Presentation is loading. Please wait.
Published byLeonard Godwin McCoy Modified over 9 years ago
1
Stein XML 1.1 XML a first course Part 1 Yaakov J. Stein Chief Scientist RAD Data Communications
2
Stein XML 1.2 Course Objectives XML what and why? Well-formed XML –Displaying XML in IE Valid XML and DTDs Parsing XML using JavaScript Processing XML using XSL
3
Stein XML 1.3 XML XML What and Why?
4
Stein XML 1.4 What is a Markup Language? Human readable text PLUS Markup elements Markup elements clarify: document structure text classification presentational preferences
5
Stein XML 1.5 An Example Digital Signal Processing: a Computer Science Perspective Jonathan (Y) Stein John Wiley and Sons False Alarm Reduction for ASR and OCR Yaakov Stein Tenth AICVNN Symposium 195-200...
6
Stein XML 1.6 Some markup element functions Structural –Clarifies document structure –Delineates document parts Descriptive (informative) –Indicates –Facilitates information retrieval Presentational (display) –Presents information in nice format –Helps human readability Referential (links, applications) –Provide hypertext links –Launch applications
7
Stein XML 1.7 Structural Markup September 1, 2000 Dear Prof. Stein, I would like to tell you how much I enjoyed reading your new text “Digital Signal Processing, A Computer Science Perspective”. I hope we will be able to meet at the next conference. Sincerely, Dee Espy
8
Stein XML 1.8 Descriptive Markup September 1, 2000 Dear Prof. Stein, I would like to tell you how much I enjoyed reading your new text “Digital Signal Processing, A Computer Science Perspective”. I hope we will be able to meet at the next conference. Sincerely, Dee Espy
9
Stein XML 1.9 Presentational Markup September 1, 2000 Dear Prof. Stein, I would like to tell you how much I enjoyed reading your new text “Digital Signal Processing, A Computer Science Perspective”. I hope we will be able to meet at the next conference. Sincerely, Dee Espy
10
Stein XML 1.10 Relational Markup Dear Prof. Stein, I would like to tell you how much I enjoyed reading your new text “Digital Signal Processing, A Computer Science Perspective”. I hope we will be able to meet at the next conference. Sincerely, Dee Espy
11
Stein XML 1.11 Generalized Markup Language William Tunnicliffe, Stanley Rice [1960s] (independently) invent idea of structural markup language Problem: need different ML for each type of document (letter, report, article, book, etc) Charles Goldfarb, Edward Mosher, Raymond Lorie (IBM) [1973] invent Generalized Markup Language (GML) Solution: use metalanguage Document Type Definition (DTD) defines tags IBM marked up 90% of its documents with GML
12
Stein XML 1.12 With GML structure is evident Library Novels Journals Textbooks Algebraic zoology Botanical history Computer poetry DSP DSP - CSP DSP just for fun Elementary QED Title Full: Digital Signal Processing a Computer Science Perspective Short: DSPCSP Author Name: Jonathan (Y) Stein Association: RAD Data Comm. Publication Publisher: John Wiley Year: 2000 Location: New York ISBN: 04712954
13
Stein XML 1.13 S tandard G eneralized M arkup L anguage Problems with GML: –No validating parser –Not portable (between computer systems) Solution: SGML ANSI [1978] ISO/IEC 8879 [1986] (Intl Org for Standardization / Intl Electrotechnical Commission) JTC1/SC34/WG1 (WG 1 of SubCommittee 34 of Joint Technical Committee 1) For presentation: Document Style Semantics and Specification Language
14
Stein XML 1.14 SGML - cont. If SGML is so good why doesn’t anyone use it ? Complexity –base standard >500 pages –SGML is a metalanguage –writing DTD is complex programming –marked up text is hard to read –DSSSL adds to complexity Inflexibility - requires absolute conformity –assumes only one correct way to markup –constrains author to dictated structure –not good at capturing author’s structure
15
Stein XML 1.15 HyperText Markup Language CERN (particle physics institute in Switzerland) was an early Internet adopter Used extensively for collaboration (articles have long author lists) Major problems with format incompatibility –only straight ASCII worked reliably Tim Berners-Lee (computer specialist) defined requirements simplicity (couldn’t expect physicists to use SGML) freedom (didn’t need validation, let browser ignore bad markup) needed hypertext links (including to documents over Internet) presentational markup (papers must look nice - authors used to T E X) Solution: HTML - a specific application of SGML (not metalanguage)
16
Stein XML 1.16 HTML versions HTML 1.0 (1989) Berners-Lee original CERN version hypertext, images, head+body structure, presentational markup HTML 2.0 (1994) IETF standard - RFC 1866 added lists, forms, etc. HTML 3.2 (1997) W3C recommendation (incorporates Netscape extensions) added tables, applets, super/sub-scripts HTML 4.0 (1997) W3C recommendation (and similar ISO/IEC 15445) minimizes presentational markup XHTML 1.0 (2000) present W3C recommendation reformulates HTML in XML
17
Stein XML 1.17 HTML document structure HTML document structure global definitions such as Web page title marked-up text
18
Stein XML 1.18 Some HTML (body) elements Level 1 Heading Level 1 Heading Level 2 Heading Level 2 Heading Level 3 Heading Level 3 Heading emphasized emphasized Paragraph Paragraph link link item 1. item 1 item 2. item 2 item 1 1 item 1 item 2 2 item 2
19
Stein XML 1.19 Problems with HTML Presentational aspects have predominated bold text blinking text red text Practically no descriptive markup Search engines are reduced to flat text search Search by topic only through keywords or portals Not extensible Can’t add new tags Unknown tags ignored Links are relatively simple Usually user action is required (except IMG) Only full document (with offset) linkable Link management is logistic nightmare
20
Stein XML 1.20 eXtensible Markup Language Simplified (best parts of) SGML (subset of features) Flexible content management tool W3C recommendation(s) Extensible - can add new elements (even without DTD) Easy to create special purpose languages (with DTD/SCHEMA) Includes HTML-like hypertext links –and extensions (XLINK, XPOINTER) The future of the web ! XML is NOT HTML++ it is SGML- - !
21
Stein XML 1.21 W3C www.w3c.org Oct 1994 Tim Berners-Lee founds at MIT&CERN, support from DARPA and EU 1996 XML WG Feb 1998 XML 1.0 Mar 1998 XLink, Xpointer, namespaces drafts May 1998 VML draft Oct 1998 DOM Level 1 Jan 1999 XML namespaces Jun 1999 XML Stylesheets (CSS) July 1999 MathML 1.0 Aug 1998 XSL draft Nov 1999 XSLT, Xpath Nov 2000 DOM Level 2 Feb 2001 MathML 2.0 May 2001 XML Schema Jun 2001 XML base, Xlink
22
Stein XML 1.22 XML WG 10 goals 1. Must be useful on Internet 2. Must support a variety of applications 3. Must be SGML compatible 4. Must be easy to write 5. Keep optional features to a minimum 6. XML documents should be human-readable 7. Produce the spec quickly 8. Design must be formal and concise 9. XML documents should be easy to create 10. Markup must be unambiguous
23
Stein XML 1.23 Why use XML ? Rich text format Force strict adherence to format Aid search Flexible database construction Content management Structured data exchange / transactions (B2B) Dynamic creation of html pages Creation of new languages (XML is a meta-language)
24
Stein XML 1.24 XML - 2 examples hello world! Hello world! - +
25
Stein XML 1.25 XML - an example we’ve seen before Digital Signal Processing: a Computer Science Perspective Jonathan (Y) Stein John Wiley and Sons False Alarm Reduction for ASR and OCR Yaakov Stein Tenth AICVNN Symposium 195-200...
26
Stein XML 1.26 Some XML based languages WML = Wireless (cellphone) Markup Language VML = Vector (graphics) Markup Language VoiceXML SSML = Speech Synthesis Markup Language CPML = Call Policy Markup Language DSML = Directory Services Markup Language MathML = Mathematical Markup Language CML = Chemical Markup Language AML = Astronomical Markup Language LegalXML BSML = Bioinformatic Sequence Markup Language GedML = Genealogical Data Markup Language FinXML = Financial market Markup Language ChessML SDML = Signed Document Markup Language RELML = Real Estate Listing Markup Language etc. etc. etc....
27
Stein XML 1.27 XML Well formed XML
28
Stein XML 1.28 What can be in an XML file? processing instructions and declarations elements attributes text entities (references) comments CDATA sections DSP-CSP J. Stein This is a great book! ©right-notice; (tags) }
29
Stein XML 1.29 Processing instructions XML files should start with an XML declaration (present version of W3C standard) (Hebrew characters) (no external files needed) We can specify a DTD (external DTD) We can specify processing using XSL or CSS (URL)
30
Stein XML 1.30 Elements In XML (unlike HTML) you define your own tags Elements can contain text text opening tag closing tag Or can be empty Element names are alphanumeric and case sensitive –First character must be letter or underscore –All others can be letters, numerals, _ -. –Also : used in “qualified” elements (see namespaces) –No white-space allowed –Names are case-sensitive Elements induce hierarchical tree structure
31
Stein XML 1.31 Attributes Elements can contain attributes (in opening tags!) text Attributes are used to qualify elements Attributes have 2 parts - name and value Attribute names have same rules as element names Can not be two attributes with the same name in a single element Attribute values must be quoted Multiple attributes are separated by spaces Design decision - should we use child elements or attributes?
32
Stein XML 1.32 Entity references Entity references are symbols that parser replaces with data Entities must be defined (in DTD) Entity reference notation: & entityname ; There are 5 predefined entity references: > & & “ " ‘ ' External entities can be text or binary files –If binary the definition must provide a notation (data type) Parameter entities are short-cuts used only inside DTD
33
Stein XML 1.33 Comments Comments are used for clarity, they (usually) have no effect Comment notation: Comments can span multiple lines Comments may not appear before xml declaration Comments may not appear inside tags Comment text may not contain - - <!-- This is a valid multi-line comment -->
34
Stein XML 1.34 CDATA sections XML text is “Parsed Character Data” or PCDATA Use CDATA when you don’t want the text parsed Can use “ & etc. in CDATA sections CDATA notation: For example, use CDATA to include source code <![CDATA[ if (i 0) then a := ‘hello’ ; ]]>
35
Stein XML 1.35 OK, so what can we do with an XML file? Check if well-formed Check if valid (against DTD or schema) Display “as-is” in browser Parse in special-purpose program (SAX, DOM) Process (XSL) to XML, HTML, etc. Display after processing In this course we will do all that using XML-aware browser (IE) and Javascript In other applications standalone programs are used
36
Stein XML 1.36 Well formed XML XML declaration is recommended Single root element Element and attribute names must be legal Elements must be properly closed –remember case-sensitivity Elements are nested but must NOT overlap Attribute values must be quoted Attribute name only once in tag
37
Stein XML 1.37 Legal HTML which is not well-formed I use bold and italic text item 1 item 2 XHTML to the rescue!!!!!
38
Stein XML 1.38 How can we try it out? There are many special XML tools, e.g. –XMLSpy –XMLwriter –Microsoft XML notepad –EZXML Modern browsers (IE5+, NN6+) support XML to some degree IE 5.5+ supports XML, DTD and XSL –with some deviations from the W3C standards IE gives an error message if XML is not well-formed IE displays tree structure and enables branch collapsing IE has XML DOM (will be explained later) IE allows scripting languages to operate on the DOM
39
Stein XML 1.39 Loading XML into IE There are many ways of loading XML into IE, for example … Click, enter name or browse to XML file Create XML island in HTML document xml Use XML as a scripting language in HTML document xml xml Load XML as an ActiveX object in HTML document var xmlDoc = new ActiveXObject("Microsoft.XMLDOM”) xmlDoc.load(" xmlfile.xml ”) EXERCISE TIME !!!!!!!!!!!!
40
Stein XML 1.40 XML Valid XML and DTDs
41
Stein XML 1.41 Valid XML The W3C wanted XML documents to be easy to create So only required XML to be well-formed But for many purposes we want XML to be valid as well A valid XML document has an associated DTD (or schema) And obeys its rules (as well as being well-formed) DTD = Document Type Definition A DTD is a text file which defines a markup language Writing a complete DTD from scratch is a big job! Reusing existing DTDs or writing small ones isn’t hard
42
Stein XML 1.42 Why validate (why use a DTD)? Enforce conformance to desired structure and field names Expose structure without data Provide presentational features Enable use of entity references Define a new markup language or protocol Allow others to use your language/protocol Example: B2B Business 1 sends order to business 2 as XML file Validation ensures that the order is correct Business 2 sends acknowledgement back to business 1
43
Stein XML 1.43 Validating using IE [we’ll learn what this means later] var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.validateOnParse="true" xmlDoc.load(“ filename.xml") if (xmlDoc.parseError.errorCode==0) document.writeln ( "file validated correctly " ) else { document.write ("Error (" + xmlDoc.parseError.errorCode + ") ") document.write ( xmlDoc.parseError.reason + " " ) document.write ("On line " + xmlDoc.parseError.line + " ") }
44
Stein XML 1.44 Simple DTD <!DOCTYPE vehicles [ ]> F-15, F-16, F-18 Mazda-Lantis, Ford-Focus, Renault-5 sports-bike, city-bike, tricycle
45
Stein XML 1.45 What can a DTD do? DTD specifies XML document structure All elements, attributes, entities must appear in DTD Hierarchical relationships are specified in DTD The number and order of occurrence may be specified Anything unspecified is forbidden XML document is valid if its structure matches DTD DTDs do NOT check text (no type-checking) –for that there is (or soon will be) schema
46
Stein XML 1.46 Some formalities DTD declaration is placed inside the root XML node DTD always specifies the name of the root XML node DTD can be internal Or external (local file) (on Internet) Or mixed, with internal overriding external instructions <!DOCTYPE rootname SYSTEM “ filename.dtd “ [ internal DTD statements ] >
47
Stein XML 1.47 DTD instructions DTDs are NOT XML files - they are another language DTDs files are case-sensitive DTD instruction notation: DTD reserved words are capitalized Instructions are: –ELEMENT –ATTLIST –ENTITY –NOTATION DTDs can have conditional sections –and processing instructions
48
Stein XML 1.48 DTD: ELEMENT element-name is the element being specified element-specification is the content the element can have: –EMPTY for empty elements –ANY for elements with arbitrary content (e.g. while debugging) –(…) for a list of content specs (1 or more) –#PCDATA means text (Parsed Character DATA) –element-name for child element Example DTD XML DSPCSP J(Y)Stein
49
Stein XML 1.49 DTD: ELEMENT lists (a) required and not repeatable (item must appear exactly once) (a?) optional (zero or one time) (a*) optional and repeatable (zero or more times) (a+) required and repeatable (one or more times) Multiple items (a, b, c) a, b, and c must all appear and in that order (a | b | c) either a or b or c must appear Some combinations: (a, (b | c)) a must appear followed by either b or c (nested parentheses) (a | b | c)* a, b, c may appear and in any order (a | b | c)+ at least one of a, b, c must appear If mix #PCDATA and children then #PCDATA must come first (#PCDATA | a | b | c)
50
Stein XML 1.50 DTD: ELEMENT Example DTD <!ELEMENT book (title, author+, publisher, (date|(edition,date)*), hardcover?) > XML DSPCSP J(Y) Stein Wiley August 2000
51
Stein XML 1.51 DTD: ATTLIST element-name is the element containing the attribute attribute-name is the attribute name attribute-type is one of 10 attribute types: –CDATA general text (must not include markup) [no relation to CDATA section!] –enumerated list of possible values (e.g. (Sunday|Monday|Tuesday) ) –ID,IDREF,IDREFS used to link elements together (ID must be unique) –NMTOKEN,NMTOKENS requires valid XML name(s) –ENTITY,ENTITIES,NOTATION used for int/ext entities default-value is value for nonspecified attribute –value simple default value –#FIXED value constant value - can’t be changed –#IMPLIED no default needed - application can decide what to do –#REQUIRED attribute value MUST be specified WARNING: There are a few special ATTLIST forms as well (xlink, whitespace, language, etc)
52
Stein XML 1.52 DTD: ATTLIST Example DTD <!ATTLIST title isbn CDATA #REQUIRED status (in-print | out-of-print) > <!ATTLIST author id ID #REQUIRED nickname NMTOKEN “none” IMPLIED email CDATA “allauthors@here.com”> XML DSPCSP <author id=“1234” email=“me@here.com” > Stein J(Y)
53
Stein XML 1.53 DTD: ENTITY There are various kinds of entities Parameter entities (used internally in DTD) Internal entities (text abbreviation) &disclaimer; External parsed entities (xml snippets) &chunk; External unparsed (binary) entities [unfortunately not yet supported in IE] ENTITYs and NOTATIONs in ATTLISTs
54
Stein XML 1.54 XML Namespaces PUBLIC DTDs are great - but can I use more than one? Namespaces are like “packages”, “modules”, “libraries” Can borrow elements and attributes from namespaces Define in attribute <xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3/org/1999/XSL/Transform”> Use as fully qualified name …
55
Stein XML 1.55 Exercise - VML Vector Markup Language supported in HTML by IE5+ vml\:* {behavior:url(#default#VML);} <vml:polyline style='position:absolute;left:0px;top:10px' points='0px,100px,100px,0px,200px,100px' strokecolor='red' strokeweight='10px'/> <vml:rect style='position:absolute;left:0px;top:110px;width:200px;height:200px' fillcolor='red'/> <vml:roundrect style='position:absolute;left:40px;top:160px;width:20px;height:20px‘ arcsize='0.3' fillcolor='yellow'/> <vml:oval style='position:absolute;left:105px;top:280px;width:5px;height:5px' fillcolor='yellow'/> <vml:polyline style='position:absolute;left:500px;top:0px' points=' 4,32, 36,32, 46, 5, 56,32, 86,32, 61,50, 71,77, 46,60, 21,77, 30,50' fillcolor='white'/>...
56
Stein XML 1.56 Schema XML DTDs are great - but somewhat limited –No type checking W3C has defined Schema - an XML language –No need to learn another language –Can parse with standard XML tools –Is extensible –Supports namespaces Schema is an “object oriented language” –supports inheritance Schema has many element types –string, normalized string, token, byte, unsignedByte, integer, Decimal, –positiveInteger, negativeInteger, nonPositiveInteger, nonNegativeInteger, –int, unsignedInt, long, unsignedLong, short, unsignedShort, –Time, dateTime, date, Duration, –boolean, float, –language, anyURI, Qname, ID, IDREF, etc –User defined simpleType or complexType
57
Stein XML 1.57 Simple Schema Example <xsd:schema elementFormDefault=“unqualified” attributeFormDefault=“unqualified” xmlns:xsd=“http://www.w3.org/2001/XMLSchema” targetNameSpace=“http://www.rad.com/hr” />
58
Stein XML 1.58
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.