Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML – Extensible Markup Language. Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML,

Similar presentations


Presentation on theme: "XML – Extensible Markup Language. Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML,"— Presentation transcript:

1 XML – Extensible Markup Language

2 Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers Validating vs. Non-Validating Parsers To understand different XML Parser Interfaces Tree Based Interface Standard : DOM Event Based Interface Standard : SAX Evaluating Parsers Which parser to use?

3 History of XML The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards Tim Berners-Lee and others created W3C (1994) Berners-Lee, who invented the World Wide Web in 1989. In 1970 IBM Introduced SGML SGML: Standard Generalized Markup Language SGML is a semantic and structural language for text documents. SGML is complicated. XML Working Group is formed under W3C in 1996. In 1998 W3C introduced XML 1.0 Extensible Markup Language (XML) is a subset of SGML

4 What is XML? XML stands for eXtensible Markup Language XML is a universal method representing data Used in applications, web and for data exchange XML is a markup language much like HTML, but used for different purposes XML is not a replacement for HTML

5 What is XML? XML was designed to describe data XML is a cross-platform, software and hardware independent tool for transmitting or exchanging information. XML is an open-standards-based technology Extensible Both Human and machine readable XML Standard XML 1.0 (1998). XML 1.1 (Feb 2004)

6 What Exactly is XML used for? Storing data in a structured manner. ( Tree structure) Storing configuration information – typically data in an application which is not stored in a database Most server software have configuration files in XML formats

7 Contd… Transmitting data between applications Overcomes Problems in Client Server applications which are cross-platform in nature Ex: A Windows program talking to a mainframe XML is a universal, standardized language used to represent data such that it can be both processed independently and exchanged between programs and applications and between clients and servers Disparate systems can exchange information in a common format

8 XML Syntax The syntax rules of XML are very simple and very strict. XML tags are not predefined. You must define your own tags GCET All XML elements must have a closing tag This is a paragraph

9 Contd… XML tags are case sensitive This is incorrect Incorrect This is correct Correct All XML elements must be properly nested Jill Jack Incorrect Jill Jack Correct Attribute values must always be quoted reynolds Incorrect reynolds Correct

10 XML Syntax All XML documents must have a root element.....

11 XML Comments Comments in XML Comments are similar to HTML John John@jerry.com John John@jerry.com

12 XML Code John John@jerry.com Tom John John@jerry.com Tom

13 Extensibility in XML A typical XML document is made up of tags enclosing the data; tag names describe the data Because the language is extensible, you can create tags that are specific to your need

14 Contd… For example, your document may contain tags to structure information about employees The tags may include,,and Data stored in XML is self-descriptive One can understand the data by just looking at tag names

15 XML – Exchanging Info Between Apps Convert information stored in the database (or any other format) to an XML format Once it is in XML format, other applications/programs can parse (read) the XML document, which is made up of the initial data XML parsers are freely available and are part of many new programming languages

16 Contd… An Application Spreadsheet Package Spreadsheet Package CAD Package CAD Package Statistical Processing Statistical Processing XML Database

17 Content Structure Presentation XML Doc DTD/XSD XSL XSD-XML Schema Definition DTD-Document Type Definition. XSL-Extensible Stylesheet Language.

18 Document Type Declaration (DTD) DTD (Document Type Definition) is used to enforce structure requirements for an XML document Document type declaration contains reference to Document Type Definition (DTD) and tells the parser which DTD to use for validation

19 Contd… <!DOCTYPE customers [ ]> John Conlon John@jerry.com <!DOCTYPE customers [ ]> John Conlon John@jerry.com

20 XML Schema An XML based alternative to DTD Richer and more useful than DTDs Written in XML and Simpler than DTDs Support data type validation (DTD does not support data type validation)

21 Harrison Ford hford@famous.org Julie jr@pw.com Harrison Ford hford@famous.org Julie jr@pw.com

22

23 Simple XML Elements with Pre-defined Data Types Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined in XSD with the following statement: XSD Syntax

24 Contd… where "element_name" is the name of the XML element, and "type_name" is one of the data type names pre- defined in XSD. XSD pre-defined data types are divided into 7 groups: Numeric data types Date and time data types String data types Binary data types Boolean data type

25 XSD Syntax Simple XML Elements with Extended Data Types Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined by using the pre-defined XSD data types.

26 They can also be defined by using extended data types, which are defined by "simpleType" statements: XSD facet statements where "element_name" is the name of the XML element, "xsd:type_name" is a pre-defined data type serving as the base data type, and "my_type_name" is the new data type extended from the base data type.

27 Complex XML Elements Complex XML Element: An XML element that has at least one child element or at least one attribute. Complex XML elements must be defined with complex data types, which are defined by "complexType" statements: XSD Syntax

28 ...... where "attribute" statement is used to define an attribute, and "sequence" statement is used to define the group of child elements, and the order the child elements should appear in the XML structure. Note that "attribute" statements must appear after the child element definition statements.

29 XSD Syntax Empty XML Elements Empty XML Element: A special complex XML element that has one attribute or more and no child text nodes. Empty XML elements must be defined with complex data types in the following format:...

30 XSD Syntax Anomymous Data Types If data type is specific to a child element in a parent data type, and there is not need to share it with data types outside the parent data type, you can define it as anonymous data type - a non-named data type defined inline. For example, the following code:

31 defines "my_data_type" which has a "setting" element, which has an anonymous data type defined inline.

32 Well-formed XML Documents A document is made of elements; There is exactly one element, called the root, or document element For all other elements, the elements, delimited by start- and end-tags, nest properly within each other Attributes if any, should have their values enclosed within quotes

33 Valid XML Documents An XML document is valid if it has an associated DTD or Schema and if the document complies with the constraints expressed in it If an XML document is valid, it is also well- formed

34 Document Type Definitions (DTDs) Describes syntax that explains which elements may appear in the XML document what are the element contents and attributes Need for DTD Validating parser ( a program) can be used to check whether XML data adheres to the rules in DTD The parser can do appropriate error handling if there are any violation Validity error is not necessary a fatal error, but some applications may treat it as fatal error

35 Document Type Declarations A valid XML document must include the reference to DTD which validates it Types of DTD Internal DTD: DTD can be embedded into XML document External DTD: DTD can be in a separate file

36 Internal DTD DTD embedded in the XML document The declarations appear between [ and ] E.g. AddressBook.xml

37 <!DOCTYPE AddressBook [ ]> Ram M G Road Bangalore

38 External DTD DTD is present in separate file Example The DTD for AddressBook.xml is contained in a file AddressBook.dtd AddressBook.xml contains only XML Data with a reference to the DTD file AddressBook.xml

39 Ram M G Road Bangalore

40 Anatomy of DTD – Defining new XML tags (Elements) element_name: Specifies name of the XML tag Content_specification: Specifies what are the contents of the element #PCDATA: Parsed character data (Extra white spaces are ignored) #CDATA: Character data (White spaces retained as is) Nested elements Empty Any (generally avoided but used in mixed content model)

41 Example: element Street contains the parsed character Data element Address contains three nested tags Name, Street and City respectively Element AddressBook contains one or more occurrences of element Address

42 Anatomy of DTD – Dealing with multiple children To declare the children of an element we use syntax similar to regular expression in Perl. To define the children of an element we use the following syntax: (Assume a and b are child elements of the element being declared)

43 A+ -One or more occurrences of a A* - Zero or more occurrences of a A?-a or nothing A, B – A followed by B A|B – a or b, but not both (expression) – Surrounding an expression with parentheses means that it is treated as a unit and may have the suffix operator ?,*or +

44 Some examples

45 Anatomy of DTD – Attribute Declarations Specifies allowable attributes of each element Tag-name: Element name Attr-Name: Name of the attribute, the attribute is defined for element Tag-Name

46 Restriction: Value : Shows a simple text value enclosed in quotes #IMPLIED:Indicates that there is no default value for this attribute, and this attribute need not be used #REQUIRED:Indicates that there is no default value for this attribute, but that a value must be assigned to this attribute #FIXED Value: In this case, Value is the attribute’s value, and the attribute must always have this value

47 Anatomy of DTD – Attribute Declarations Example The element Name has attribute salutation which is of type CDATA The attribute salutation must be specified in the Name tag

48 Anatomy of DTD – Entity Declarations (1 of 2) Way to escape special characters Some special characters such as, & are not used as #PCDATA This escaping of the characters is called as “Entity reference”

49 Following different entity references are used in the XML document Built-in Entities: &, <, >, &apos;, " Characters Entities : ó representing ó Example Jammu & Kashmir

50 Anatomy of DTD – Entity Declarations(2 of 2) Data that is frequently used can be declared as an General Entity entity_name : Name of the new Entity entity_contents : Contents of the new entity

51 Example Defines the entity called as MyCountry “India” is the contents of entity MyCountry Usage in the XML Document &MyCountry;

52 XML Schema What is XML Schema? An XML vocabulary for expressing your data's structure and business rules Validating parsers can use Schema to check whether XML data adheres to rules in schema More robust and extensive than DTD, can do even data type validations

53 E.g. : Consider following XML Document 45609 Kiran IWT 80 A

54 Is this data valid? To be valid, it must meet following business rules (constraints) The Result must be comprised of a Subject, Marks, Grade in the order shown The Subject must be any valid subject from the list (DC, IWT, Cryptography) The Marks must be between 0 to 100 only and Grade can be either A or B or C

55 How can XML schema help to accomplish this? Answer It creates XML vocabulary : Defines following set of elements,,, It specifies the contents of each element and restrictions on each element element must contain,, in that order must be one of the valid subjects (IWT, Cryptography, DC) The Marks must be between 0 to 100 only Grade can be either A or B or C

56 XML Schema specifies in which namespace the created vocabulary must be in It is not an actual URL, but uses URL syntax and should be a unique string Example: http://www.Results.com Namespace defines the following vocabulary

57 Example of referring to Schema <res:Result xmlns:res="http://www.Results.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance" xsi:schemaLocation="http://www.Results.com Result.xsd"> Kiran 45609 IWT 80.70 A PF 78.30 B+

58 Schema example : Result.xsd <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.Results.com" xmlns="http://www.Results.com" elementFormDefault="qualified">

59 Schema example : Result.xsd

60 DTD vs Schema XML document and DTD use different syntax : Inconsistency Schema uses XML syntax Limited data type capability DTDs support a very limited capability for specifying data types. DTDs do not support field level validations and complex types E.g. : You can't, express "I want the element to hold an integer with a range of 0 to 100“ in DTD Schema describes a set of data types compatible with those found in databases E.g.: Database supports integer, string, etc data types Schema supports integer, string etc while the DTD does not

61 Element Declarations: Simple Element Syntax : Element_name: Any valid xml name Element_type : Built in Simple type Occurrence : Number of occurrences of that element, optional

62 Example : Defines the element Name of type string Defines the element Marks of simple type float Marks may appear for maximum 5 times And by default for minimum 1 time

63 Element Declarations Syntax :

64 Example Defines non reusable complex element called ‘Subject’ Each element appears in that sequence because tag is used

65 Element Declarations: Reusable Simple Type Element_type_name : Name of the data type Base_data_type : Any of the built in simple data type (integer, float etc) Restriction_specification : Specifies restriction on the element if any

66 Example : Defines the reusable element type MarksType Element defined as MarksType may take minimum value of 0.0 and maximum value 100.0

67 Element Declarations: Reusable Complex Type Syntax Defines the reusable type Type_name Example

68 Defines reusable complex element type SubjectType Comprises of following elements in the sequence specified ( tag) Name Marks Grade This type can be used to define elements in your XML

69 Defining the Attributes Syntax : Example All attributes are declared as simple types. Only complex elements can have attributes

70 Anatomy of XML Schema : Constraints specification Controls occurrence of individual element or group of elements Types of constraints : allows only one element to appear : elements must appear in the same order as they are declared : elements can occur in any order and in any combination

71 constraint E.g.: Allows either first or last name to be used in the instance XML Document

72 constraints E.g.: All elements must appear in the defined order only

73 Anatomy of XML Schema : Constraints specification constraints E.g. : Any of the elements can either appear or not appear Elements may appear in any order

74 XML Parsers

75 XML Parser : The Big Picture Usage of the XML Parser XML Document XML Parser Client Application API’s Parsed Data XML DTD / Schema

76 Why to use Parser? Typically use a pre-built XML parser (e.g. JAXP, Apache Xerces etc) This enables you to build your application much more quickly

77 Need for Parser Defining the Parser’s Responsibilities Ensure that the document adheres to specific standards Does the document match the DTD or Schema? Is the document well-formed? Make the document contents available to your application The parser will parse the XML document, and make this data available to your application An application using parser can access data in XML by going through the hierarchy or using tag names

78 Types of XML Parsers Validating Parser a parser that verifies that the XML document adheres to the DTD or Schema Non-Validating Parser a parser that does not verify the XML document against the DTD or Schema Most parsers provide an option to turn validation on or off All parsers checks the well-formedness of XML document at all times

79 XML Parser Interfaces Two types of Interfaces provided by XML Parsers SAX An Event Based Interface DOM a Tree Based Interface JAXP “Java API for XML Processing” JAXP is part of JDK Provides parsers which can be used in any Java application It supports both Tree Based Parser : DOM Event Based Parser : SAX

80 DOM Parser Tree Based Parser Definition: Parser reads the XML document, and creates an in-memory “tree” representation of XML Document For example: Given a sample XML document below What kind of tree would be produced?

81 Kiran 45609 CHSSC 80 A

82 In memory tree created by Tree Based Parser Tree represents the hierarchy of XML document

83 DOM Parser Result Name EmpNo Kiran 45609 Text Nodes Element Nodes

84 DOM Parser Tree based APIs presents a memory model of entire document to an application once parsing has concluded No need to use extra data-structures to maintain the information during parsing An application can navigate through the tree to find the desired pieces of document Document Object Model (DOM) is the standard for Tree Based parsing of XML document

85 Document Object Model (DOM) The Document Object Model (DOM) is a set of interfaces defined by the W3C DOM Working Group DOM is the tree based interface used by the programmers to manipulate the XML document DOM Parser can be Validating or Non Validating DOM Parser represents the logical Model of the XML document in the memory All the entity reference are expanded before the DOM tree was constructed

86 DOM Structure representing XML Document Element Attribute Element Text Comment Result Name Subject Kiran EmpNo IWT Text 45609 XML Document Structure Document Structure representing Result.xml Name Grade Marks 80.0 A Document Root Element Node Text Node

87 Document Object Model (DOM) : Overview The root of the DOM Hierarchy is called as a Document node Example : Result The Child nodes of the Document node are : Element nodes, Comments nodes etc Example : Name, Subject, EmpNo, etc are all Child Nodes All the nodes in the XML Document are derived from interface : org.w3c.dom.Node

88 The Big picture : Parsing the XML Document Document builder factory creates an instance of parser with required characteristics Whether the parser should be validating parser or not Whether namespace support required or not, Whether to ignore the white spaces between the elements or not Factory hides the implementation details of the parser and gives a standard DOM interface for parsing XML (Analogous to JDBC driver)

89

90 DomApp.java : Parsing XML Document using DOM Parser public class DomApp { public static void main(String argv[]) { MyErrorHandler hErr; Document hDocument; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); factory.setNamespaceAware(true);

91 try { hErr = new MyErrorHandler(); DocumentBuilder hBuilder = factory.newDocumentBuilder(); // Set the error handler hBuilder.setErrorHandler(hErr); hDocument = hBuilder.parse( new File(“Result.xml”)); } catch (Exception e){ // Handle exception if generated during parsing } }// End of Function main }

92 Parsing the XML Document using DOM Parser Step 1: Get the instance of document-builder factory. This will be used to produce the DOM-parser (called DocumentBuilder) DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); Step 2: Set the properties of the DOM parser to be produced a. It should validate the XML Document against the Schema / DTD b. It should be namespace aware factory.setValidating(true); factory.setNamespaceAware(true); Step 3 : Obtain the instance of the MyErrorHandler class This instance handles the error generated during parsing, in application specific way hErr = new MyErrorHandler();

93 Step 4: Obtain the instance of DOM parser, and register the error handler This will be used to parse the XML Document and creates the memory based tree representation of the XML Document DocumentBuilder hBuilder=factory.newDocumentBuilder(); hBuilder.setErrorHandler(hErr); Step 5 : Parse the XML Document (Result.xml) using the parser created as above hDocument = hBuilder.parse( new File(“Result.xml”));

94 The Node interface is the root of DOM Core class hierarchy This interface can be used to extract information from any DOM object without knowing its actual type (e.g. Element node, Text node, Attr Node etc ) of underlying node i.e. It is possible to access a document's complete structure and content using only the methods and properties exposed by the Node interface The Class Hierarchy rooted at org.w3c.dom.Node

95 DOM : Exploring the org.w3c.dom.Node Interface Node ElementDocument Attr Text Comment Entity

96 DOM : Important Methods of Node interface Methods to retrieve the various information from the XML DOM Tree Node getFirstChild(): Returns the first child of the current node Node getLastChild(): Returns the last child of the current node String getNodeName(): The name of this node String getNodeValue(): The value of this node, depending on its type short getNodeType(): A code representing the type of the underlying object

97 Methods to alter the elements of XML DOM Tree Node insetBefore( Node newChild, Node refChild) Node appendChild (Node newChild) Node removeChild (Node oldChild) Node replaceChild (Node newChild, Node oldChild )

98 Using Node Interface Reslt Name Subject Kiran EmpNo Name 45609 Node hLastChild = hNode.getLastChild(); hFirstChild= hFirstChild.getFirstChild(); String sName = hFirstChild.getNodeName() String sVal = hFirstChild.getNodeValue() hNode = hDocument.getDocumentElement() Node hFirstChild= hNode.getFirstChild();

99 XML Parser Interfaces : Event Based Interface Event Based Interface Definition : Parser reads the XML document and generates events for each parsing step Some common parsing events Element start-tag read Element content read Element end- tag read

100 Example Kiran 45609 CHSSC 80 A

101 XML Parser Interfaces : Event Generated startElement : Result startElement : Name contents: Kiran endElement : Name startElement : EmpNo contents: 45609 endElement : EmpNo endElement : Result

102 XML Parser Interfaces : Event Based Interface For each of these events, your application implements “event handlers” Each time an event occurs, a different event handler is called Your application intercepts these events, and handles them in any way you want Application does not wait till the entire document gets parsed Application has to maintain the information from XML document within local data-structures till it is processed completely Simple API for XML (SAX) is the standard for Event Based parsing of XML document

103 SAXApp.java : Parsing XML Document using SAX Parser public class SAXApp { public static void main(String argv[]) { //Get the instance of parser event handing class DefaultHandler handler = new Handler(); //Get the instance of SAXParserFactory SAXParserFactory factory = SAXParserFactory.newInstance(); try { // Set the properties of the parser to be obtained factory.setValidating(true); factory.setNamespaceAware(true);

104 // Get the new SAX Parser SAXParser saxParser = factory.newSAXParser(); // Parse the file // handler : processes events generated during parsing saxParser.parse(new File(“Result.xml”), handler); } //Handle any exceptions if generated during parsing catch (Throwable t) { t.printStackTrace(); } } // End of function main }

105 SAXApp.java : Parsing XML Document using SAX Parser class Handler extends DefaultHandler{ public void error(SAXParseException e) throws SAXException { System.out.println("Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber()); // Print the error message System.out.print(e.getMessage()); } // Process any fatal errors in the XML document public void fatalError(SAXParseException e) throws SAXException { System.out.println("Fatal Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber()); // Print the error message System.out.print(e.getMessage()); } } //End Class DefaultHander

106 Understanding The Simple API for XML (SAX) Step 1: Get the instance of SAXParserFactory This instance is used to obtain the SAX Parser SAXParserFactory factory = SAXParserFactory.newInstance(); Step 2:Get the instance of the event handler class This class handles all the events generated by parser DefaultHandler handler = new Handler(); Step 3:Set the properties of the parser to be obtained a. It should validate the XML Document against the Schema / DTD b. It should be namespace aware factory.setValidating(true); factory.setNamespaceAware(true); Step 4 : Obtain the instance of the SAX Parser using the factory just obtained SAXParser saxParser = factory.newSAXParser(); Step 5: Parse the Result.xml file using the SAX Parser obtained as above Events generated during parsing will be handled by object handler saxParser.parse(new File(“Result.xml”), handler);

107 The Big picture : Paring the XML Document using SAX XML Document SAX Parser Factory DefaultHandler/ MyHandler org.xml.sax ContentHander org.xml.sax ErrorHander org.xml.sax EntityResolver Parser Events org.xml.sax class hierarchy implements

108 org.xml.sax Interfaces org.xml.sax.DefaultHandler Class Provides the default implementation of all the events DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods). Only the methods which are required are overridden

109 org.xml.sax.ContentHandler Interface Receive notification of the logical content of a document Defines methods like startDocument(), endDocument(), startElement(), and endElement() These are invoked when an XML tags arerecognized Also defines methods characters() which are invoked when the parser encounters the text in an XML element

110 org.xml.sax Interfaces org.xml.sax.ErrorHandler Interface Allows SAX application to do customized error handling The parser will then report all errors and warnings through this interface

111 Important Methods void error() : receives the notification of recoverable error void fatalError(): receives the notification of non- recoverable error void warning(): receives the notification of a warning

112 Evaluating Parsers : SAX vs. DOM SAX Advantage It is good when serial processing of the document is required and document is very large i.e. when the size of the XML document is in terms of GBs. Disadvantage Requires internal data structure to maintain the parts of XML document till the complete processing is not finished, therefore not suitable for parsing the small XML Documents.

113 DOM Advantage Supports DOM Tree Traversing methods Allows modification of XML Document Good when the random access of a document is required Disadvantage For large XML documents (size in GBs) requires more memory as compared to memory required to parse XML document using SAX Parser.


Download ppt "XML – Extensible Markup Language. Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML,"

Similar presentations


Ads by Google