Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going.

Similar presentations


Presentation on theme: "EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going."— Presentation transcript:

1 eXtensible Markup Language (XML)

2 Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going to be everywhere” – W3C

3 XML BASICS

4 Introduction XML is a portable, widely supported, open technology for data storage and exchange

5 Developed from SGML (Standard Generalised Markup Language ) Became a W3C Recommendations in 1998 A meta-markup language Deficiencies of HTML and SGML –Many complex features that are rarely used HTML is a markup language, XML is used to define markup languages Introduction

6 Markup languages defined in XML are known as applications XML can be written by hand or generated by computer –Useful for data exchange Foundation for several next-generation web technologies: –RSS, AJAX, Web Services, etc. Introduction

7 What is XML? A specification for creating markup languages to store data and exchange data Tags are not predefined (user generated)

8 Why XML? Sample Catalog Entry in HTML Laptop Computer IBM Thinkpad 600E 400 MHz 64 Mb 8 Gb 4.1 pounds $3200 How can I parse the content? E.g. price? Need a more flexible mechanism than HTML to interpret content.

9 Why XML? IBM ThinkPad 600E 400 64 8 4.1 3200 Sample Catalog Entry using XMl

10 What does XML do? XML is used to structure and describe information. Intended to be used with the Internet Used to facilitate sharing data between different systems

11 Smart processing using XMl and provide logical containers for extracting and manipulating product information as a unit – Sort by,,,, etc. Explicit identification of each part enables its automated processing – Convert from “USD” to Euro, Yen,etc.

12 Document exchange Use of XMl allows companies to exchange information that can be processed automatically without human intervention e.g. –Purchase orders –Invoices –Catalogues etc

13 Difference between HTML and XML? XML was designed to carry data. XML is not a replacement for HTML. XML and HTML were designed with different goals: XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML is about displaying information, while XML is about describing information.

14 How can XML be Used? XML can Separate Data from HTML With XML, your data is stored outside your HTML XML is used to Exchange Data With XML, data can be exchanged between incompatible systems With XML, financial information can be exchanged over the Internet XML can be used to Share Data XML can be used to Store Data XML can make your Data more Useful XML can be used to Create new Languages

15 XML Pros and Cons pro: –human-readable, self-documenting format –strict syntax allows standardized tools –international, platform-independent –can represent almost any general kind of data (record, list, tree) con: –bulky syntax/structure makes files large; can decrease performance

16 An example XML file

17 Components of an XML Document Processing Instructions Elements Elements with Attributes

18 XML Document Structure A header, then a single document tag that can contain other tags – Tag syntax: – text or tags Attribute syntax: –name="value" comments:

19 XML Document Structure Well-formed XML documents –All XML documents begin with an XML declaration. –All begin tags have a matching end tag Empty tags –There is one root tag that contains all the other tags in a document –Attributes must have a value assigned, the value must be quoted –The characters, & can only appear with their special meaning –XML tags are case sensitive –XML elements must be properly nested

20 XML Document Structure Specified by either DTD or XML schemas. (specify the tags and their order) Valid documents are well-formed and also conform to a schema which defines details of the allowed content Validity is tested against a schema, discussed later

21 XML SCHEMA

22 XML Schema Schema: describes the structure of an XML document, by setting rules specifying which tags and attributes are valid, and how they can be used together Used to check XML files to make sure they follow the rules set in the schema; W3C validator uses it to validate doctypeat top of XHTML file specifies schema Two ways to define a schema: –Document Type Definition (DTD) –W3C XML Schema

23 Document Type Definitions A set of structural rules called declarations Define tags, attributes, entities Specify the order and nesting of tags Specify which attributes can be used with which tags. DTD can be: –Embedded inside XML (internal DTD). –Stored in a separate file (external DTD saved as.dtd).

24 Document Type Definitions General syntax for declarations – –Note, not XML! Four possible keywords: –ELEMENT, for defining tags –ATTLIST, for specifying attributes in your tags –ENTITY, for identifying sources of data –NOTATION, for defining data types for non-XML data

25 Declaring Elements General syntax - - Content description specifies what tags may appear inside the named element and whether there may be any plain text in the content EX: An element can be either an internal or a leaf node. Multiplicity –+ –* –? Leaf elements can be: –#PCDATA –EMPTY –ANY

26 Declaring Attributes General syntax –<!ATTLIST element-name (attribute-name attribute-type default-value?)+ > There are 10 attribute types, only CDATA will be used. Default values –A value –#FIXED value –#REQUIRED (no default value, each instance must specify value) –#IMPLIED (default, if not specified)

27 Declaring Attributes Example: DTD XML Element

28 ENTITIES XML document may be distributed among a number of files –Each unit of information is called an entity –Each entity has a name to identify it –Defined using an entity declaration –Used by calling an entity reference

29 General Entities Declared in DTD – ]> The &xml; includes entities The eXtensible Markup Language includes entities

30 Attributesvs. Elements There are no rules about when to use attributes and when to use elements. Avoid XML Attributes? Some of the problems with using attributes are: –attributes cannot contain multiple values (elements can) –attributes cannot contain tree structures (elements can) –attributes are not easily expandable (for future changes)

31 Internal and External DTDs A document type declaration can either contain declarations directly or can refer to another file Internal –<!DOCTYPE root-element[ declarations ]> External file –

32 DTD Example Tove Jani Reminder Don't forget me this weekend! Syntax: <!Doctype root-element SYSTEM “filename” You usually specify the DTD for your XML document by providing a reference to it near the top of the XMl document.

33 DTD Example The DTD (note.DTD) for XMl document Note: is. <!DOCTYPE note [ ]> Lists the document type (note) and the valid elements, and the type of content they can accept

34 Why use a DTD? With DTD, each of your XML files can carry a description of its own format with it. With a DTD, independent groups of people can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify that the data you receive from the outside world is valid.

35 NAMESPACES

36 “XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.” Multiple namespaces can be used in a single document Two types of Namespaces: –Default namespace –Explicit Use a prefix. Prefix is used for two reasons: –URI is too long to be typed –URI includes illegal characters

37 NAMESPACES Example: <states xmlns= “http://www.states.org/states” xmlns:cap= “http://www.states.org/capitals”> South Dakota South Dakota DTDs do not support namespaces very well

38 NAMESPACES

39

40 W3C XML SCHEMA (XSD)

41 Remember “Valid” XML Adheres to specification in DTD or XSD

42 XML Schemas (XSD) XSD, can specify elements, attributes, nesting, ordering, #occurrences. However DTDs have several deficits –They do not use XML syntax –They do not support namespaces –Data types cannot be strictly specified Example date vs. string is like DTD

43 Schema Fundamentals Documents that conform to a schema’s rules are considered instances of that schema Schema purposes –Structure of instances. –Data types of elements and attributes. W3C XML Schemas support namespaces –The XML Schema language itself is a set of XML tags –The application being described is another set of tags

44 Defining a Schema The root of an XML Schema document is the tag Attributes –xmlns attributes for the schema namespace and for the namespace being defined –A targetNamespaceattribute declaring the namespace being defined –An elementFormDefaultattribute with the value qualified to indicate that all elements defined in the target namespace must be namespace qualified (either with a prefix or default) when used

45 Defining a Schema Example <xsd:schema xmlns:xsd= “http://www.w3.org/2001/XMLSchema” targetNamespace= “http://www.sustech.edu” xmlns= “http://www.sustech.edu” elementFormDefault= "qualified">.

46 An Overview of Data Types Data types are of two kinds –Simple data types with string content –Complex data types with elements, attributes and string content Predefined types –Primitive: string, Boolean, float.. –Derived: PositiveInteger, long, decimal Restrictions (user defined types) –Facets(to limit its content, or require the data to match a specific pattern)

47 Simple Types Named types can be used to give the type of –an attribute (which must be simple) or –an element (which may be simple or complex) Elements or attributes with simple type may have default values specified Syntax – –where xxx is the name of the element and yyy is the data type of the element –E.g.:

48 Simple Types New simple types can be defined by restriction of base types Example:

49 Complex Types A complex element is an XML element that contains other elements and/or attributes. There are four kinds of complex elements: –empty elements –elements that contain only other elements –elements that contain only text –elements that contain both other elements and text –Check http://www.w3schools.com/schema/schema_complex.asp

50 Empty Elements

51 Elements Only Element-only element must be contained in an ordered group, an unordered group, a choice or a named group. The sequenceelement is used to contain an ordered group. The allelement is used to contain an unordered group. An element definition can be associated with a type by –Referring to a named type directly in the type attribute

52 Elements Only

53 XSD (note.xsd) >

54 Defining a Schema Instance The xmlnsattribute declares a namespace for an element and its descendants – –The element itself may not be in the namespace –Multiple elements may be defined The http://www.w3.org/2001/XMLSchema-instancenamespace includes one attribute, schemaLocation

55 Reference to XSD (notes.xml)

56 Validating Instances of Schemas eVarious systems for validating instances against schemas –Online http://www.w3.org/2001/03/webdata/xsv or http://validator.w3.org/ –Standalone automatic validation tools: AltovaXMLSpy, oXygenXML editor, XML Copy ditor, XMLLINT

57 Example

58

59 XPATH

60 What is Xpath? XPathis a syntax used for selecting parts of an XML document The way Xpath describes paths to elements is similar to the way an operating system describes paths to files Xpath is almost a small programming language; it has functions, tests, and expressions Xpath is a W3C standard Xpath is not itself written as XML, but is used heavily in XSLT

61 Xpath capabilities Xpath provides a syntax for: –navigating around an XML document –selecting nodes and values –comparing node values –performing arithmetic on node values Xpath= Path expressions + Conditions Xpath provides some functions (e.g., concat(), substring(), etc.) to facilitate the above. Today XPath expressions can also be used in JavaScript, Java, XML Schema, PHP, Python, C and C++, and lots of other languages.

62 Xpath Basic Constructs

63 Example

64

65 eXtensibleStyle Language Transformation

66 Displaying Raw XML Documents Plain XML documents are generally displayed literally by browsers Firefox notes that there is no style information

67 Displaying Raw XML Documents Raw XML files can be viewed in IE 5.0 (and higher) and in Netscape 6 –but to make it display like a web page, you have to add some display information XML documents do not carry information about how to display the data Different solutions to the display problem, using CSS, XSL, JavaScript, and XML Data Islands

68 Displaying XML Documents with CSS An xml-stylesheet processing instruction can be used to associate a general XML document with a style sheet – The style sheet selectors will specify tags that appear in a particular document

69 XSLT Style Sheets A family of specifications for transforming XML documents –XSLT: specifies how to transform documents –XPath: specifies how to select parts of a document and compute values –XSL-FO: specifies a target XML language describing the printed page XSLT describes how to transform XML documents into other XML documents such as XHTML –XSLT can be used to transform to non-XML documents as well

70 Overview of XSLT A functional style programming language Basic syntax is XML An XSLT processor takes an XML document as input and produces output based on the specifications of an XSLT document

71 XSLT Processing XSLT Document XML Document XSLT Processor XSL Document

72 Creating XSLT Stylesheets Uses the http://www.w3c.org/1999/XSL/Transformnamespace, and is typically assigned XSL prefix XSLT Stylesheets are defined using the root tag Stylesheets typically contain one or more tags that define each template –Templates have name or/and match attributes Templates contain other XSLT tags that control how the XML data is transformed

73 Common XSLT Elements,,

74 An Example XSLT Template

75 Example: the xml file Belgian Waffles $5.95 two of our famous Belgian Waffles with plenty of real maple syrup 650 Strawberry Belgian Waffles $7.95 light Belgian waffles covered with strawberries and whipped cream 900 …

76 Example: the xsl file - ( calories per serving)

77 Demo –XSLT EXAMPLE

78 XML PROCESSOR

79 Uses of XML To exchange data between incompatible systems (just send an XML document, with an agreed definition of the tags) For B2B e-commerce – exchange of business documents between businesses - XML is flexible enough to describe any logical text structure e.g. Purchase order, invoice To store data – as plain text files, or in databases To create new mark-up languages (I.e. that uses tags) – Can use XML to agree what the tags mean. Many mark- up languages already created that have been based on XML – e.g. JSTL, WML, VoiceXML, XHTML

80 Using an XMl document Need an XML Parser to “use” or parse out the data held in the XMl document

81 XML Processors XML processors provide tools in programming languages to read in XML documents, manipulate them and to write them out

82 Purposes of XML Processors Four purposes –Check the basic syntax of the input document –Replace entities –Insert default values specified by schemas or DTD’s –If the parser is able and it is requested, validate the input document against the specified schemas or DTD’s The basic structure of XML is simple and repetitive, so providing library support is reasonable Examples –Xerces-J from the Apache foundation provides library support for Java –Command line utilities are provided for checking well-formedness and validity Two different standards/models for processing –Tree based – Document Object Model (DOM) –Event based – Simple API for XMl (SAX)

83 Parsing The process of reading in a document and analyzing its structure is called parsing The parser provides as output a structured view of the input document

84 XML Parsers An XML parser does the following: Retrieves and read the an XML document – I.e. “parses” the document to figure out what’s in it, Ensures the document adheres to specific standards (e.g. well formed? Adheres to DTD?) Makes the document contents available to your application

85 XML Parsers: Tree based DOM interface Uses Document Object Model (DOM) Tree based interface (navigates through the document) Developed by W3C XML parsers that use DOM exist for java, javascript, perl, C++

86 Tree based DOM parser - example Object/Tree Interface (DOM) Definition: Parser reads the XML document, and creates an in-memory “tree” of data – an object module of the data For example: Given a sample XML document on the next slide, what kind of tree would be produced?

87 Tree based DOM parser - example Sample XML Document 87 78

88 Tree based DOM parser - example

89 XML Parsers: Event based SAX parser Simple API for XML Event based Developed by volunteers on the XML- dev mailing list http://www.megginson.com/SAX/

90 Event based SAX parser Event Based Parser Definition: Parser reads the XML document, and generates events for each parsing event. They don’t create an in memory object model of the document – it’s up to the programmer to write the code to interpret the events For example: Given the same XML document, what kind of events would be produced?

91 Event based SAX parser: example 87 78

92 Event based SAX parser: example Events generated: 1. Start of Element 2. Start of Element 3. Start of Element 4. Character Event: 87 5. End of Element 6. Start of Element 7. Character Event: 78 8. End of Element 9. End of Element 10. End of Element

93 Event based parsers For each of these events, the your application implements “event handlers.” Each time an event occurs, a different event handler is called. Your application intercepts these events, and handles them in any way you want.

94 Comparing tree based DOM parser with event based SAX parser Questions: Which parser is faster? Which parser is more efficient? Which parser is suitable for which type of XML documents?

95 Comparing tree based DOM parser with event based SAX parser Tree based: slower takes up more memory Simpler to use More suitable for documents that are less structured, with less repetition of tags. More suitable where the program needs to move around the document alot within the program  need to keep easy access to full document at all time. Event based: Faster Takes up much less memory But More complex to implement Good for large, machine generated, structured documents e.g. book contents (because repetitive nature of tags allows for re-use of event handling code and therefore less work for programmer Good where only parts of the document needed at any one time within the document (event based parsers cannot “skip around” from one part of the document to the other

96 Comparing tree based DOM parser with event based SAX parser Performance and Memory Therefore, when high performance and low-memory are the most important criteria, use an event-based parser. Examples: Java applets Palm Pilot Applications Parsing Huge Data files

97 The SAX Approach In the SAX approach, an XML document is read in serially As certain conditions, called events, are recognized, event handlers are called The program using this approach only sees part of the document at a time

98 The DOM Approach In the DOM approach, the parser produces an in-memory representation of the input document –Because of the well-formedness rules of XML, the structure is a tree Advantages over SAX –Parts of the document can be accessed more than once –The document can be restructured –Access can be made to any part of the document at any time –Processing is delayed until the entire document is checked for proper structure and, perhaps, validity One major disadvantage is that a very large document may not fit in memory entirely

99 DOM Parser example

100 XML IN PRACTICE Ajax RSS KML (Keyhole Markup Language) Google Earth ODF (Open Document Format) Open Office eBooks and ePub Web Services: SOAP & WSDL

101 Web resources http://www.deitel.com/ResourceCenters/Programming/XML/tabid/279/ Default.aspx W3 Schools XML Tutorial http://www.w3schools.com/xml/default.asp W3C XML page http://www.w3.org/XML/ XML Tutorials http://www.programmingtutorials.com/xml.aspx Online resource for markup language technologies http://xml.coverpages.org/ Several Online Presentations


Download ppt "EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going."

Similar presentations


Ads by Google