Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XMLXML Slide Courtesy to prof. Elis USC.

Similar presentations


Presentation on theme: "1 XMLXML Slide Courtesy to prof. Elis USC."— Presentation transcript:

1 1 XMLXML Slide Courtesy to prof. Elis Horowitz @ USC

2 2 What is XML XML stands for Extensible Markup Language –the World Wide Web Consortium (W3C) directs the effort XML isn't a markup language, like HTML, but rather a system for defining other markup languages. XML is a common syntax for expressing structure in data, and as a result a way for others to define new tags –whereas the tag in HTML specifies text to be presented in a certain typeface and weight, an XML tag would explicitly identify the kind of information it surrounds: tag might identify the author of a document, tag could contain an item's cost in an inventory list

3 3 SGML, XML and HTML The parent of HTML and XML is Standard Generalized Markup Language (SGML) an ISO standard for electronic document exchange SGML competes with other standards, mainly de facto standards, like Adobe PDF (Acrobat), Microsoft RTF (Rich Text Format) and popular word processor file formats like Microsoft Word. both XML and HTML are document formats derived from SGML. –Thus they all share certain characteristics, such as a similar syntax and the use of bracketed tags. –But HTML is an application of SGML, whereas XML is a subset of SGML. XML documents can be –read by any SGML authoring or viewing tool. –XML is less complex than SGML, and it is designed to work across a limited-bandwidth network such as the Internet.

4 4 Why Are Developers Excited about XML? Domain-Specific Markup Languages –A DTD precisely describes the format –DTDs verify that documents adhere to the format –Ensures interoperability of unrelated tools Self-Describing Data –DTDs explain the format so reverse engineering isn't as necessary –Comments in DTDs can go even further Interchange of Data Among Applications –E-commerce and syndication –DTDs make sure that two independent applications speak the same language –DTDs detect malformed data –DTDs verify correct data Structured and Integrated Data –Can specify relationships between elements using element declarations –Can assemble data from multiple sources using external entity references declared in the DTD

5 5 XML Appications Chemical Markup Language (CML) –Jumbo: the first general-purpose XML browser –Assign each XML elements to a java class that knows how to render that element –http://www.xml-cml.org Mathematical Markup Language (MathML) –The Amaya browser Synchronized Multimedia Integration Language (SMIL) Scalable Vector Graphics MusicML FoodWebML, GuiML

6 6 A Song Description in HTML Hot Cop by Jacques Morali, Henri Belolo, and Victor Willis Producer: Jacques Morali Publisher: PolyGram Records Length: 6:20 Written: 1978 Artist: Village People

7 7 A Song Description in XML Hot Cop Jacques Morali Henri Belolo Victor Willis Jacques Morali PolyGram Records 1978 Village People

8 8 Using XSLT Attaching style sheets to documents

9 9 Hot Cop Jacques Morali Henri Belolo Victor Willis Jacques Morali PolyGram Records 1978 Village People Using CSS – simpler, but limitted

10 10 Well-formedness All XML documents must be well-formed Well-formedness rules: –Open and close all tags –Empty tags end with /> –There is a unique root element –Elements may not overlap –Attribute values are quoted –< and & are only used to start tags and entities Parsers are required to reject malformed documents. This improves compatibility and interoperability.

11 11 Well-formedness Rules Open and close all tags Empty tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entities Only the five predefined entity references are used

12 12 What is a Document Type Definition A Document Type Definition (DTD) is a set of syntax rules for tags. It tells you –what tags you can use in a document, –what order they should appear in, –which tags can appear inside other ones, –which tags have attributes, and so on. Originally developed for use with SGML, a DTD can be part of an XML document, but it's usually a separate document or series of documents. Because XML is not a language itself, but rather a system for defining languages, it doesn't have a universal DTD the way HTML does. Instead, each industry or organization that wants to use XML for data exchange can define its own DTDs. If an organization uses XML to tag documents for internal use only, it can create its own private DTD.

13 13 Validity To be valid an XML document must be 1.Well-formed 2.Must have a Document Type Definition (DTD) 3.Must comply with the constraints specified in the DTD

14 14 Validity is not always sufficient DTDs cannot specify anything about the contents of an element. –That an element must contain a number –That an element must contain a date –That a date must be between 1970 and 2001 –etc. Custom validation layers can sit on top of XML validation Schemas will add this

15 15 XML Schemas an XML-based syntax, or schema, for defining how an XML document is marked up. recommended by Microsoft an alternative to Document Type Definition (DTD) DTDs have many drawbacks, including the use of non-XML syntax, no support for data-typing, and non-extensibility. XML Schema improves upon DTDs in several ways, including the use of XML syntax, and support for data-typing and namespaces. For example, an XML Schema allows you to specify an element as an integer, a float, a boolean, an URL, etc. The XML parser in Internet Explorer 5 can validate an XML document with both a DTD and an XML Schema.

16 16 How to process XML? Java Parsers DOM Parser – tree structure SAX Parser – event driven approach DOM Parser makes use of SAX parser to parse and then create a tree structure

17 17 DTDs – Content Definitions Content model definitions describe what may be contained in an instance of an element –names of allowed or forbidden elements –DTD entities –document text syntax for expressing content is a form of regular expressions: –(…) delimits a group –A | Beither A or B –A, BA followed by B –A & BA and B in any order –A?A occurs zero or one time –A*A occurs zero or more times –A+A occurs one or more times

18 18 Element Declarations Each tag must be declared in a declaration. A declaration gives the name and content model of the element The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element

19 19 Content Specifications ANY – –A catalog can contain any child element and/or raw text (parsed character data) #PCDATA –Parsed Character Data; i.e. raw text, no markup. For example, – 1984 – Sequences Choices Mixed Content Modifiers EMPTY

20 20 #PCDATA There are a number of elements in the example document that only contain PCDATA:

21 21 Comments in DTDs DTDs seem fundamentally more obfuscated than C. Comments can improve this by giving example elements Comments are the same as in HTML; e.g.

22 22 Child Elements 1994 To declare that a date element must have a year child:

23 23 Child Elements You only have to declare the immediate children Elliotte Rusty Harold Julie Mandel To declare that an element must have exactly one name child:

24 24 Sequences Elliotte Rusty Harold Separate multiple required child elements with commas; e.g. A list of child elements separated by commas is called a sequence

25 25 More Sequences To use a sequence in an ELEMENT declaration: –The element being described must have only child elements, no mixed content –You must know the order of the child elements –You must know the type of each child element –You must know the number of child elements –The number can be relaxed with wild cards

26 26 One or More Children + Compositions by the members of New York Women Composers music publishing scores women composers New York The + suffix indicates that one or more of that element is required at that point

27 27 A DTD for Songs

28 28 Internal DTDs <!DOCTYPE GREETING [ ]> Hello XML!

29 29 Complete Example – Mail Message Suppose we describe an email message as consisting of: a title; a header made of: the sender; the recipient; a subject; the body text made of: four paragraphs; quoted material; The tags are <!-- is a comment, (head,body) implies a group with body following head TO is followed by FR and both must appear, ? Means SB is optional, P may occur zero or more times

30 30 Well-formedness All XML documents must be well-formed Well-formedness rules: –Open and close all tags –Empty tags end with /> –There is a unique root element –Elements may not overlap –Attribute values are quoted –< and & are only used to start tags and entities Parsers are required to reject malformed documents. This improves compatibility and interoperability.

31 31 Well-formedness Rules Open and close all tags Empty tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entities Only the five predefined entity references are used

32 32 Open and close all tags Good: – The quick brown fox jumped over the lazy dog – A very important point –Copyright 1999 Ellis Horowitz Bad: –The quick brown fox jumped over the lazy dog – A very important point –Copyright 1999 Ellis Horowitz

33 33 Empty tags end with />,, and instead of,, and Web browsers deal inconsistently with these Can use instead

34 34 There is a unique root element One element completely contains all other elements of the document This is HTML in HTML files The XML declaration and xml-stylesheet processing instruction are not elements

35 35 Elements may not overlap If an element contains a start tag for an element, it must also contain the corresponding end tag Empty elements may appear anywhere Every non root element has a parent element

36 36 Attribute values are quoted Good: – Bad: –

37 37 < and & are only used to start tags and entities Good: O'Reilly & Associates Bad: O'Reilly & Associates Good: for (int i = 0; i <= args.length; i++ ) { Bad: for (int i = 0; i

38 38 Only the five predefined entity references are used Good: –& –< –> –" –&apos; Bad: –© –® –&tm; –α –é – –etc. DTDs loosen this restriction by allowing you to define new entities, even in an invalid document.

39 39 Validity To be valid an XML document must be 1.Well-formed 2.Must have a Document Type Definition (DTD) 3.Must comply with the constraints specified in the DTD

40 40 Validity is not always sufficient DTDs cannot specify anything about the contents of an element. –That an element must contain a number –That an element must contain a date –That a date must be between 1970 and 2001 –etc. Custom validation layers can sit on top of XML validation Schemas will add this

41 41 XML Schemas an XML-based syntax, or schema, for defining how an XML document is marked up. recommended by Microsoft an alternative to Document Type Definition (DTD) DTDs have many drawbacks, including the use of non-XML syntax, no support for data-typing, and non-extensibility. XML Schema improves upon DTDs in several ways, including the use of XML syntax, and support for data-typing and namespaces. For example, an XML Schema allows you to specify an element as an integer, a float, a boolean, an URL, etc. The XML parser in Internet Explorer 5 can validate an XML document with both a DTD and an XML Schema.

42 42 Compare DTD & Schema

43 43 http://www.w3schools.com/schema/schema_schema.asp

44 44 A DTD for Songs

45 45 A Valid Song Document Hot Cop Jacques Morali Henri Belolo Victor Willis Jacques Morali PolyGram Records 1978 Village People

46 46 XSLT - XSL Transformations XSL (eXtensible Stylesheet Language) consists of two parts: XSL Transformations and XSL Formatting Objects. An XSLT stylesheet is an XML document defining a transformation for a class of XML documents. A stylesheet seperates contents and logical structure from presentation. Not intended as completely general-purpose XML transformation language - designed for XSL Formatting Objects. Nevertheless: XSLT is generally useful. The basic idea: The basic design: XSLT is declarative and based on pattern-matching and templates

47 47 Song.xml processed with song2HTML.xsl

48 48 song2HTML.xsl

49 49 song2HTML.xsl

50 50 Transformer.java

51 51 Processing model template rule = pattern + template Construction of result tree fragment: the source tree is processed by processing the root a single node is processed by 1.finding the template rule with the best matching pattern 2.instantiating its template (creates fragment + continues processing recursively) a node list is processed by processing each node in order current node : the node currently being processed current node list : the node list currently being processed (used for evaluation context later)

52 52

53 53

54 54

55 55

56 56

57 57

58 58

59 59

60 60 CSS Examples – self study

61 61 A Blank Style Sheet...

62 62 The Default Rule Not every element needs a rule The root element should be at least display: block catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block }

63 63 A style rule for the category element Make it look like an H1 heading category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text- align: center} catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block }

64 64 A style rule for the composer element Make it look like a level 2 head No need to styleize the first, middle, and last names separately composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font- weight: bold; text-align: left}

65 65 A style rule for the title element composition title { display: block; font- family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left}

66 66 Style Rules for composition children composition * {display:list-item} description {display: block}

67 67 Finished Style Sheet category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center} catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block } composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left} composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text- align: left} composition * {display:list-item} description {display: block} // cataloging_info is only for search engines cataloging_info { display: none; color: #FFFFFF} last_updated, copyright, maintainer {display: block; font- size: small} copyright:before {content: "Copyright " } last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }

68 68 Java Parsers DOM Parser – tree structure SAX Parser – event driven approach DOM Parser makes use of SAX parser to parse and then create a tree structure

69 69 Day Planner – example DTD

70 70 Planner Application

71 71

72 72

73 73

74 74

75 75

76 76

77 77


Download ppt "1 XMLXML Slide Courtesy to prof. Elis USC."

Similar presentations


Ads by Google