7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
SDPL 2002Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
Document Type Definitions
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable.
MC365 XML Parsers. Today We Will Cover: An overview of the Java API’s used for XML processing Creating an XML document in Java Parsing an XML document.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
1 CS122B: Projects in Databases and Web Applications Spring 2015 Notes 05: XML Professor Chen Li Department of Computer Science UC Irvine CS122BNotes 05:
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
XML DTD. XML Validation XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid" XML.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
13-Mar-16 DOM. 2 Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
21-Jun-16 Document Object Model DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Unit 4 Representing Web Data: XML
Java XML IS
Chapter 7 Representing Web Data: XML
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
DOM 8-Dec-18.
A parser for XML Documents
DOM 24-Feb-19.
SAX2 29-Jul-19.
Presentation transcript:

7-Mar-16 Simple API XML

 SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an ad-hoc (but very popular) standard  SAX was developed by David Megginson and is open source  There are various implementations available  Java implementations are provided as part of JAXP (Java API for XML Processing)  JAXP is included as a package in Java 1.4  JAXP is available separately for Java 1.3  Unlike many XML technologies, SAX and DOM are relatively easy

 DOM reads the entire XML document into memory and stores it as a tree data structure  SAX reads the XML document and calls one of your methods for each element or block of text that it encounters  Consequences:  DOM provides “random access” into the XML document  SAX provides only sequential access to the XML document  DOM is slow and requires huge amounts of memory, so it cannot be used for large XML documents  SAX is fast and requires very little memory, so it can be used for huge documents (or large numbers of documents)  This makes SAX much more popular for web sites  Some DOM implementations have methods for changing the XML document in memory; SAX implementations do not

 SAX works through callbacks: you call the parser, it calls methods that you supply Your program main(...) startDocument(...) startElement(...) characters(...) endElement( ) endDocument( ) parse(...) The SAX parser

 The program consists of two classes:  Sample -- This class contains the main method; it  Gets a factory to make parsers  Gets a parser from the factory  Creates a Handler object to handle callbacks from the parser  Tells the parser which handler to send its callbacks to  Reads and parses the input XML file  Handler -- This class contains handlers for three kinds of callbacks:  startElement callbacks, generated when a start tag is seen  endElement callbacks, generated when an end tag is seen  characters callbacks, generated for the contents of an element

 import javax.xml.parsers.*; // for both SAX and DOM import org.xml.sax.*; import org.xml.sax.helpers.*;  // For simplicity, we let the operating system handle exceptions // In "real life" this is poor programming practice public class Sample { public static void main(String args[]) throws Exception {  // Create a parser factory SAXParserFactory factory = SAXParserFactory.newInstance();  // Tell factory that the parser must understand namespaces factory.setNamespaceAware(true);  // Make the parser SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader();

 In the previous slide we made a parser, of type XMLReader  // Create a handler Handler handler = new Handler();  // Tell the parser to use this handler parser.setContentHandler(handler);  // Finally, read and parse the document parser.parse("hello.xml");  } // end of Sample class  You will need to put the file hello.xml :  In the same directory, if you run the program from the command line  Or where it can be found by the particular IDE you are using

 public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException  This method is called at the beginning of every element  If the parser is namespace-aware,  namespaceURI will hold the prefix (before the colon)  localName will hold the element name (without a prefix)  qualifiedName will be the empty string  If the parser is not using namespaces,  namespaceURI and localName will be empty strings  qualifiedName will hold the element name (possibly with prefix)

 When SAX calls startElement, it passes in a parameter of type Attributes  Attributes is an interface that defines a number of useful methods; here are a few of them:  getLength() returns the number of attributes  getLocalName(index) returns the attribute’s local name  getQName(index) returns the attribute’s qualified name  getValue(index) returns the attribute’s value  getType(index) returns the attribute’s type, which will be one of the Strings "CDATA", "ID", "IDREF", "IDREFS", "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION"  As with elements, if the local name is the empty string, then the attribute’s name is in the qualified name

 If the file hello.xml contains: Hello World!  Then the output from running java Sample will be: startElement: display characters: "Hello World!" Element: /display

 Now suppose the file hello.xml contains :  Hello World!  Notice that the root element,, now contains a nested element and some whitespace (including newlines)  The result will be as shown at the right:  startElement: display characters: "" characters: " " characters: " " startElement: i characters: "Hello" endElement: /i characters: "World!" characters: " " endElement: /display // empty string // newline // spaces // another newline

 A callback handler for SAX must implement these four interfaces:  interface ContentHandler  This is the most important interface--it handles basic parsing callbacks, such as element starts and ends  interface DTDHandler  Handles only notation and unparsed entity declarations  interface EntityResolver  Does customized handling for external entities  interface ErrorHandler  Must be implemented or parsing errors will be ignored!  You could implement all these interfaces yourself, but that’s a lot of work--it’s easier to use an adapter class

 If hello.xml contains:  Hello World!  Then the sample program we started with gives:  startElement: display characters: <-- zero length string characters: <-- LF character (ASCII 10) characters: Hello World! <-- spaces are preserved characters: <-- LF character (ASCII 10) Element: /display

 Whitespace is a major nuisance  Whitespace is characters; characters are PCDATA  IF you are validating, the parser will ignore whitespace where PCDATA is not allowed by the DTD  If you are not validating, the parser cannot ignore whitespace  If you ignore whitespace, you lose your indentation  To ignore whitespace when validating:  Happens automatically  To ignore whitespace when not validating:  Use the String function trim() to remove whitespace  Check the result to see if it is the empty string

 SAX error handling is unusual  Most errors are ignored unless you register an error handler ( org.xml.sax.ErrorHandler )  Ignored errors can cause bizarre behavior  Failing to provide an error handler is unwise  The ErrorHandler interface declares:  public void fatalError (SAXParseException exception) throws SAXException // XML not well structured  public void error (SAXParseException exception) throws SAXException // XML validation error  public void warning (SAXParseException exception) throws SAXException // minor problem