XML – Extensible Markup Language. Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML,

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
An Introduction to XML Based on the W3C XML Recommendations.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
History Leading to XHTML
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Tutorial 11 Creating XML Document
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
School of Computing and Information Systems CS 371 Web Application Programming XML and JSON Encoding Data.
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
XML Study-Session: Part III
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Unit 4 Representing Web Data: XML
X M L Extensible Markup Language
XML QUESTIONS AND ANSWERS
XML in Web Technologies
Chapter 7 Representing Web Data: XML
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Presentation transcript:

XML – Extensible Markup Language

Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers Validating vs. Non-Validating Parsers To understand different XML Parser Interfaces Tree Based Interface Standard : DOM Event Based Interface Standard : SAX Evaluating Parsers Which parser to use?

History of XML The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards Tim Berners-Lee and others created W3C (1994) Berners-Lee, who invented the World Wide Web in In 1970 IBM Introduced SGML SGML: Standard Generalized Markup Language SGML is a semantic and structural language for text documents. SGML is complicated. XML Working Group is formed under W3C in In 1998 W3C introduced XML 1.0 Extensible Markup Language (XML) is a subset of SGML

What is XML? XML stands for eXtensible Markup Language XML is a universal method representing data Used in applications, web and for data exchange XML is a markup language much like HTML, but used for different purposes XML is not a replacement for HTML

What is XML? XML was designed to describe data XML is a cross-platform, software and hardware independent tool for transmitting or exchanging information. XML is an open-standards-based technology Extensible Both Human and machine readable XML Standard XML 1.0 (1998). XML 1.1 (Feb 2004)

What Exactly is XML used for? Storing data in a structured manner. ( Tree structure) Storing configuration information – typically data in an application which is not stored in a database Most server software have configuration files in XML formats

Contd… Transmitting data between applications Overcomes Problems in Client Server applications which are cross-platform in nature Ex: A Windows program talking to a mainframe XML is a universal, standardized language used to represent data such that it can be both processed independently and exchanged between programs and applications and between clients and servers Disparate systems can exchange information in a common format

XML Syntax The syntax rules of XML are very simple and very strict. XML tags are not predefined. You must define your own tags GCET All XML elements must have a closing tag This is a paragraph

Contd… XML tags are case sensitive This is incorrect Incorrect This is correct Correct All XML elements must be properly nested Jill Jack Incorrect Jill Jack Correct Attribute values must always be quoted reynolds Incorrect reynolds Correct

XML Syntax All XML documents must have a root element.....

XML Comments Comments in XML Comments are similar to HTML John John

XML Code John Tom John Tom

Extensibility in XML A typical XML document is made up of tags enclosing the data; tag names describe the data Because the language is extensible, you can create tags that are specific to your need

Contd… For example, your document may contain tags to structure information about employees The tags may include,,and Data stored in XML is self-descriptive One can understand the data by just looking at tag names

XML – Exchanging Info Between Apps Convert information stored in the database (or any other format) to an XML format Once it is in XML format, other applications/programs can parse (read) the XML document, which is made up of the initial data XML parsers are freely available and are part of many new programming languages

Contd… An Application Spreadsheet Package Spreadsheet Package CAD Package CAD Package Statistical Processing Statistical Processing XML Database

Content Structure Presentation XML Doc DTD/XSD XSL XSD-XML Schema Definition DTD-Document Type Definition. XSL-Extensible Stylesheet Language.

Document Type Declaration (DTD) DTD (Document Type Definition) is used to enforce structure requirements for an XML document Document type declaration contains reference to Document Type Definition (DTD) and tells the parser which DTD to use for validation

Contd… <!DOCTYPE customers [ ]> John Conlon <!DOCTYPE customers [ ]> John Conlon

XML Schema An XML based alternative to DTD Richer and more useful than DTDs Written in XML and Simpler than DTDs Support data type validation (DTD does not support data type validation)

Harrison Ford Julie Harrison Ford Julie

Simple XML Elements with Pre-defined Data Types Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined in XSD with the following statement: XSD Syntax

Contd… where "element_name" is the name of the XML element, and "type_name" is one of the data type names pre- defined in XSD. XSD pre-defined data types are divided into 7 groups: Numeric data types Date and time data types String data types Binary data types Boolean data type

XSD Syntax Simple XML Elements with Extended Data Types Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined by using the pre-defined XSD data types.

They can also be defined by using extended data types, which are defined by "simpleType" statements: XSD facet statements where "element_name" is the name of the XML element, "xsd:type_name" is a pre-defined data type serving as the base data type, and "my_type_name" is the new data type extended from the base data type.

Complex XML Elements Complex XML Element: An XML element that has at least one child element or at least one attribute. Complex XML elements must be defined with complex data types, which are defined by "complexType" statements: XSD Syntax

where "attribute" statement is used to define an attribute, and "sequence" statement is used to define the group of child elements, and the order the child elements should appear in the XML structure. Note that "attribute" statements must appear after the child element definition statements.

XSD Syntax Empty XML Elements Empty XML Element: A special complex XML element that has one attribute or more and no child text nodes. Empty XML elements must be defined with complex data types in the following format:...

XSD Syntax Anomymous Data Types If data type is specific to a child element in a parent data type, and there is not need to share it with data types outside the parent data type, you can define it as anonymous data type - a non-named data type defined inline. For example, the following code:

defines "my_data_type" which has a "setting" element, which has an anonymous data type defined inline.

Well-formed XML Documents A document is made of elements; There is exactly one element, called the root, or document element For all other elements, the elements, delimited by start- and end-tags, nest properly within each other Attributes if any, should have their values enclosed within quotes

Valid XML Documents An XML document is valid if it has an associated DTD or Schema and if the document complies with the constraints expressed in it If an XML document is valid, it is also well- formed

Document Type Definitions (DTDs) Describes syntax that explains which elements may appear in the XML document what are the element contents and attributes Need for DTD Validating parser ( a program) can be used to check whether XML data adheres to the rules in DTD The parser can do appropriate error handling if there are any violation Validity error is not necessary a fatal error, but some applications may treat it as fatal error

Document Type Declarations A valid XML document must include the reference to DTD which validates it Types of DTD Internal DTD: DTD can be embedded into XML document External DTD: DTD can be in a separate file

Internal DTD DTD embedded in the XML document The declarations appear between [ and ] E.g. AddressBook.xml

<!DOCTYPE AddressBook [ ]> Ram M G Road Bangalore

External DTD DTD is present in separate file Example The DTD for AddressBook.xml is contained in a file AddressBook.dtd AddressBook.xml contains only XML Data with a reference to the DTD file AddressBook.xml

Ram M G Road Bangalore

Anatomy of DTD – Defining new XML tags (Elements) element_name: Specifies name of the XML tag Content_specification: Specifies what are the contents of the element #PCDATA: Parsed character data (Extra white spaces are ignored) #CDATA: Character data (White spaces retained as is) Nested elements Empty Any (generally avoided but used in mixed content model)

Example: element Street contains the parsed character Data element Address contains three nested tags Name, Street and City respectively Element AddressBook contains one or more occurrences of element Address

Anatomy of DTD – Dealing with multiple children To declare the children of an element we use syntax similar to regular expression in Perl. To define the children of an element we use the following syntax: (Assume a and b are child elements of the element being declared)

A+ -One or more occurrences of a A* - Zero or more occurrences of a A?-a or nothing A, B – A followed by B A|B – a or b, but not both (expression) – Surrounding an expression with parentheses means that it is treated as a unit and may have the suffix operator ?,*or +

Some examples

Anatomy of DTD – Attribute Declarations Specifies allowable attributes of each element Tag-name: Element name Attr-Name: Name of the attribute, the attribute is defined for element Tag-Name

Restriction: Value : Shows a simple text value enclosed in quotes #IMPLIED:Indicates that there is no default value for this attribute, and this attribute need not be used #REQUIRED:Indicates that there is no default value for this attribute, but that a value must be assigned to this attribute #FIXED Value: In this case, Value is the attribute’s value, and the attribute must always have this value

Anatomy of DTD – Attribute Declarations Example The element Name has attribute salutation which is of type CDATA The attribute salutation must be specified in the Name tag

Anatomy of DTD – Entity Declarations (1 of 2) Way to escape special characters Some special characters such as, & are not used as #PCDATA This escaping of the characters is called as “Entity reference”

Following different entity references are used in the XML document Built-in Entities: &, <, >, &apos;, " Characters Entities : ó representing ó Example Jammu & Kashmir

Anatomy of DTD – Entity Declarations(2 of 2) Data that is frequently used can be declared as an General Entity entity_name : Name of the new Entity entity_contents : Contents of the new entity

Example Defines the entity called as MyCountry “India” is the contents of entity MyCountry Usage in the XML Document &MyCountry;

XML Schema What is XML Schema? An XML vocabulary for expressing your data's structure and business rules Validating parsers can use Schema to check whether XML data adheres to rules in schema More robust and extensive than DTD, can do even data type validations

E.g. : Consider following XML Document Kiran IWT 80 A

Is this data valid? To be valid, it must meet following business rules (constraints) The Result must be comprised of a Subject, Marks, Grade in the order shown The Subject must be any valid subject from the list (DC, IWT, Cryptography) The Marks must be between 0 to 100 only and Grade can be either A or B or C

How can XML schema help to accomplish this? Answer It creates XML vocabulary : Defines following set of elements,,, It specifies the contents of each element and restrictions on each element element must contain,, in that order must be one of the valid subjects (IWT, Cryptography, DC) The Marks must be between 0 to 100 only Grade can be either A or B or C

XML Schema specifies in which namespace the created vocabulary must be in It is not an actual URL, but uses URL syntax and should be a unique string Example: Namespace defines the following vocabulary

Example of referring to Schema <res:Result xmlns:res=" xmlns:xsi=" instance" xsi:schemaLocation=" Result.xsd"> Kiran IWT A PF B+

Schema example : Result.xsd <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified">

Schema example : Result.xsd

DTD vs Schema XML document and DTD use different syntax : Inconsistency Schema uses XML syntax Limited data type capability DTDs support a very limited capability for specifying data types. DTDs do not support field level validations and complex types E.g. : You can't, express "I want the element to hold an integer with a range of 0 to 100“ in DTD Schema describes a set of data types compatible with those found in databases E.g.: Database supports integer, string, etc data types Schema supports integer, string etc while the DTD does not

Element Declarations: Simple Element Syntax : Element_name: Any valid xml name Element_type : Built in Simple type Occurrence : Number of occurrences of that element, optional

Example : Defines the element Name of type string Defines the element Marks of simple type float Marks may appear for maximum 5 times And by default for minimum 1 time

Element Declarations Syntax :

Example Defines non reusable complex element called ‘Subject’ Each element appears in that sequence because tag is used

Element Declarations: Reusable Simple Type Element_type_name : Name of the data type Base_data_type : Any of the built in simple data type (integer, float etc) Restriction_specification : Specifies restriction on the element if any

Example : Defines the reusable element type MarksType Element defined as MarksType may take minimum value of 0.0 and maximum value 100.0

Element Declarations: Reusable Complex Type Syntax Defines the reusable type Type_name Example

Defines reusable complex element type SubjectType Comprises of following elements in the sequence specified ( tag) Name Marks Grade This type can be used to define elements in your XML

Defining the Attributes Syntax : Example All attributes are declared as simple types. Only complex elements can have attributes

Anatomy of XML Schema : Constraints specification Controls occurrence of individual element or group of elements Types of constraints : allows only one element to appear : elements must appear in the same order as they are declared : elements can occur in any order and in any combination

constraint E.g.: Allows either first or last name to be used in the instance XML Document

constraints E.g.: All elements must appear in the defined order only

Anatomy of XML Schema : Constraints specification constraints E.g. : Any of the elements can either appear or not appear Elements may appear in any order

XML Parsers

XML Parser : The Big Picture Usage of the XML Parser XML Document XML Parser Client Application API’s Parsed Data XML DTD / Schema

Why to use Parser? Typically use a pre-built XML parser (e.g. JAXP, Apache Xerces etc) This enables you to build your application much more quickly

Need for Parser Defining the Parser’s Responsibilities Ensure that the document adheres to specific standards Does the document match the DTD or Schema? Is the document well-formed? Make the document contents available to your application The parser will parse the XML document, and make this data available to your application An application using parser can access data in XML by going through the hierarchy or using tag names

Types of XML Parsers Validating Parser a parser that verifies that the XML document adheres to the DTD or Schema Non-Validating Parser a parser that does not verify the XML document against the DTD or Schema Most parsers provide an option to turn validation on or off All parsers checks the well-formedness of XML document at all times

XML Parser Interfaces Two types of Interfaces provided by XML Parsers SAX An Event Based Interface DOM a Tree Based Interface JAXP “Java API for XML Processing” JAXP is part of JDK Provides parsers which can be used in any Java application It supports both Tree Based Parser : DOM Event Based Parser : SAX

DOM Parser Tree Based Parser Definition: Parser reads the XML document, and creates an in-memory “tree” representation of XML Document For example: Given a sample XML document below What kind of tree would be produced?

Kiran CHSSC 80 A

In memory tree created by Tree Based Parser Tree represents the hierarchy of XML document

DOM Parser Result Name EmpNo Kiran Text Nodes Element Nodes

DOM Parser Tree based APIs presents a memory model of entire document to an application once parsing has concluded No need to use extra data-structures to maintain the information during parsing An application can navigate through the tree to find the desired pieces of document Document Object Model (DOM) is the standard for Tree Based parsing of XML document

Document Object Model (DOM) The Document Object Model (DOM) is a set of interfaces defined by the W3C DOM Working Group DOM is the tree based interface used by the programmers to manipulate the XML document DOM Parser can be Validating or Non Validating DOM Parser represents the logical Model of the XML document in the memory All the entity reference are expanded before the DOM tree was constructed

DOM Structure representing XML Document Element Attribute Element Text Comment Result Name Subject Kiran EmpNo IWT Text XML Document Structure Document Structure representing Result.xml Name Grade Marks 80.0 A Document Root Element Node Text Node

Document Object Model (DOM) : Overview The root of the DOM Hierarchy is called as a Document node Example : Result The Child nodes of the Document node are : Element nodes, Comments nodes etc Example : Name, Subject, EmpNo, etc are all Child Nodes All the nodes in the XML Document are derived from interface : org.w3c.dom.Node

The Big picture : Parsing the XML Document Document builder factory creates an instance of parser with required characteristics Whether the parser should be validating parser or not Whether namespace support required or not, Whether to ignore the white spaces between the elements or not Factory hides the implementation details of the parser and gives a standard DOM interface for parsing XML (Analogous to JDBC driver)

DomApp.java : Parsing XML Document using DOM Parser public class DomApp { public static void main(String argv[]) { MyErrorHandler hErr; Document hDocument; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); factory.setNamespaceAware(true);

try { hErr = new MyErrorHandler(); DocumentBuilder hBuilder = factory.newDocumentBuilder(); // Set the error handler hBuilder.setErrorHandler(hErr); hDocument = hBuilder.parse( new File(“Result.xml”)); } catch (Exception e){ // Handle exception if generated during parsing } }// End of Function main }

Parsing the XML Document using DOM Parser Step 1: Get the instance of document-builder factory. This will be used to produce the DOM-parser (called DocumentBuilder) DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); Step 2: Set the properties of the DOM parser to be produced a. It should validate the XML Document against the Schema / DTD b. It should be namespace aware factory.setValidating(true); factory.setNamespaceAware(true); Step 3 : Obtain the instance of the MyErrorHandler class This instance handles the error generated during parsing, in application specific way hErr = new MyErrorHandler();

Step 4: Obtain the instance of DOM parser, and register the error handler This will be used to parse the XML Document and creates the memory based tree representation of the XML Document DocumentBuilder hBuilder=factory.newDocumentBuilder(); hBuilder.setErrorHandler(hErr); Step 5 : Parse the XML Document (Result.xml) using the parser created as above hDocument = hBuilder.parse( new File(“Result.xml”));

The Node interface is the root of DOM Core class hierarchy This interface can be used to extract information from any DOM object without knowing its actual type (e.g. Element node, Text node, Attr Node etc ) of underlying node i.e. It is possible to access a document's complete structure and content using only the methods and properties exposed by the Node interface The Class Hierarchy rooted at org.w3c.dom.Node

DOM : Exploring the org.w3c.dom.Node Interface Node ElementDocument Attr Text Comment Entity

DOM : Important Methods of Node interface Methods to retrieve the various information from the XML DOM Tree Node getFirstChild(): Returns the first child of the current node Node getLastChild(): Returns the last child of the current node String getNodeName(): The name of this node String getNodeValue(): The value of this node, depending on its type short getNodeType(): A code representing the type of the underlying object

Methods to alter the elements of XML DOM Tree Node insetBefore( Node newChild, Node refChild) Node appendChild (Node newChild) Node removeChild (Node oldChild) Node replaceChild (Node newChild, Node oldChild )

Using Node Interface Reslt Name Subject Kiran EmpNo Name Node hLastChild = hNode.getLastChild(); hFirstChild= hFirstChild.getFirstChild(); String sName = hFirstChild.getNodeName() String sVal = hFirstChild.getNodeValue() hNode = hDocument.getDocumentElement() Node hFirstChild= hNode.getFirstChild();

XML Parser Interfaces : Event Based Interface Event Based Interface Definition : Parser reads the XML document and generates events for each parsing step Some common parsing events Element start-tag read Element content read Element end- tag read

Example Kiran CHSSC 80 A

XML Parser Interfaces : Event Generated startElement : Result startElement : Name contents: Kiran endElement : Name startElement : EmpNo contents: endElement : EmpNo endElement : Result

XML Parser Interfaces : Event Based Interface For each of these events, your application implements “event handlers” Each time an event occurs, a different event handler is called Your application intercepts these events, and handles them in any way you want Application does not wait till the entire document gets parsed Application has to maintain the information from XML document within local data-structures till it is processed completely Simple API for XML (SAX) is the standard for Event Based parsing of XML document

SAXApp.java : Parsing XML Document using SAX Parser public class SAXApp { public static void main(String argv[]) { //Get the instance of parser event handing class DefaultHandler handler = new Handler(); //Get the instance of SAXParserFactory SAXParserFactory factory = SAXParserFactory.newInstance(); try { // Set the properties of the parser to be obtained factory.setValidating(true); factory.setNamespaceAware(true);

// Get the new SAX Parser SAXParser saxParser = factory.newSAXParser(); // Parse the file // handler : processes events generated during parsing saxParser.parse(new File(“Result.xml”), handler); } //Handle any exceptions if generated during parsing catch (Throwable t) { t.printStackTrace(); } } // End of function main }

SAXApp.java : Parsing XML Document using SAX Parser class Handler extends DefaultHandler{ public void error(SAXParseException e) throws SAXException { System.out.println("Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber()); // Print the error message System.out.print(e.getMessage()); } // Process any fatal errors in the XML document public void fatalError(SAXParseException e) throws SAXException { System.out.println("Fatal Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber()); // Print the error message System.out.print(e.getMessage()); } } //End Class DefaultHander

Understanding The Simple API for XML (SAX) Step 1: Get the instance of SAXParserFactory This instance is used to obtain the SAX Parser SAXParserFactory factory = SAXParserFactory.newInstance(); Step 2:Get the instance of the event handler class This class handles all the events generated by parser DefaultHandler handler = new Handler(); Step 3:Set the properties of the parser to be obtained a. It should validate the XML Document against the Schema / DTD b. It should be namespace aware factory.setValidating(true); factory.setNamespaceAware(true); Step 4 : Obtain the instance of the SAX Parser using the factory just obtained SAXParser saxParser = factory.newSAXParser(); Step 5: Parse the Result.xml file using the SAX Parser obtained as above Events generated during parsing will be handled by object handler saxParser.parse(new File(“Result.xml”), handler);

The Big picture : Paring the XML Document using SAX XML Document SAX Parser Factory DefaultHandler/ MyHandler org.xml.sax ContentHander org.xml.sax ErrorHander org.xml.sax EntityResolver Parser Events org.xml.sax class hierarchy implements

org.xml.sax Interfaces org.xml.sax.DefaultHandler Class Provides the default implementation of all the events DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods). Only the methods which are required are overridden

org.xml.sax.ContentHandler Interface Receive notification of the logical content of a document Defines methods like startDocument(), endDocument(), startElement(), and endElement() These are invoked when an XML tags arerecognized Also defines methods characters() which are invoked when the parser encounters the text in an XML element

org.xml.sax Interfaces org.xml.sax.ErrorHandler Interface Allows SAX application to do customized error handling The parser will then report all errors and warnings through this interface

Important Methods void error() : receives the notification of recoverable error void fatalError(): receives the notification of non- recoverable error void warning(): receives the notification of a warning

Evaluating Parsers : SAX vs. DOM SAX Advantage It is good when serial processing of the document is required and document is very large i.e. when the size of the XML document is in terms of GBs. Disadvantage Requires internal data structure to maintain the parts of XML document till the complete processing is not finished, therefore not suitable for parsing the small XML Documents.

DOM Advantage Supports DOM Tree Traversing methods Allows modification of XML Document Good when the random access of a document is required Disadvantage For large XML documents (size in GBs) requires more memory as compared to memory required to parse XML document using SAX Parser.