1 The Simple API for XML (SAX) Part I ©Copyright 2003-2004. These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
Slides prepared by Rose Williams, Binghamton University ICS201 Exception Handling University of Hail College of Computer Science and Engineering Department.
1 Lecture 11 Interfaces and Exception Handling from Chapters 9 and 10.
SDPL 2003Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
Summer A-2000, Project Course-- Carnegie Mellon University 1 Financial Engineering Project Course.
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
Fundamentals of Computer Science Lecture 14: Recursion Instructor: Evan Korth New York University.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
Week 5 Basic SAX Example From Chapter 5 of XML and Java Working with XML SAX Filters as described in Chapter 5.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Internet Technologies1 More XML Schema The main source for these slides is “The XML Companion” by Bradley Other resources:
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
Io package as Java’s basic I/O system continue’d.
The Joy of SAX (and DOM, and JDOM…) Bill MacCartney 11 October 2004.
Java Software Solutions Foundations of Program Design Sixth Edition
Exception Handling in Java Exception Handling Introduction: After completing this chapter, you will be able to comprehend the nature and kinds.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
CMSC 202 Exceptions. Aug 7, Error Handling In the ideal world, all errors would occur when your code is compiled. That won’t happen. Errors which.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
3/29/2001 O'Reilly Java Java API for XML Processing 1.1 What’s New Edwin Goei Engineer, Sun Microsystems.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
EXtensible Markup Language (XML) James Atlas July 15, 2008.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Very Brief Introduction to Java I/O with Buffered Reader and Buffered Writer.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
Winter 2006CISC121 - Prof. McLeod1 Last Time Reviewed class structure: –attributes –methods –(inner classes) Looked at the effects of the modifiers: –public.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
Parsing with SAX using Java Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
XML Parsers.
Java XML IS
Intro to XML.
XML Parsers By Chongbing Liu.
Jagdish Gangolly State University of New York at Albany
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Jagdish Gangolly State University of New York at Albany
Java API for XML Processing
A parser for XML Documents
XML Parsers.
CMSC 202 Exceptions.
SAX2 29-Jul-19.
Presentation transcript:

1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer- Verlag) by Ethan Cerami. Please for permission to

2 Road Map SAX Overview –What is SAX? –Advantages/Disadvantages Basic SAX Examples –About Xerces 2 Parser –XMLReader Interface –ContentHandler Interface –Extending the SAX Default Handler Checking for Well-Formedness

3 SAX Overview

4 Introduction to SAX The Simple API for XML (SAX) is a standard, event- based interface for parsing XML documents. Versions: –SAX 1.0: original standard –SAX 2.0: current standard SAX is a de facto standard, supported by most XML parsers today. Unlike DOM, it is not an official W3C standard. SAX was originally built explicitly for Java, but SAX now exists for other languages, including Perl, Python, etc.

5 SAX Interface At its core, SAX is simply a series of interfaces that are implemented by an XML parser. Because different parsers implement the same SAX interface, you can easily swap in/out different parsers.

6 SAX Interface Java App SAX Interface Xerces Parser Crimson Parser Ælfred Parser XML Document Implementation details are hidden behind the SAX interface. You can therefore swap parsers in/out. Same idea as JDBC.

7 Advantages/Disadvantages Advantages –Very widely implemented by just about every XML Parser –Fast Performance –Low Memory Overhead Disadvantages –Does not provide an easy to navigate XML tree like DOM or JDOM. –Does not provide an easy mechanism for creating/modifying XML documents.

8 Basic SAX Example

9 Xerces 2 Parser All of our examples will use the Xerces 2 Parser. Xerces 2 is the latest open source XML parser from the Apache XML Group. The Distribution is available at: The distribution includes two JAR files: –xmlParserAPIs.jar: includes the relevant XML APIs, including DOM Level 2, SAX 2.0, and JAXP 1.2. –xercesImpl.jar: includes the Xerces implementation of the XML APIs.

10 BasicSAX.java First example illustrates the simplest SAX functionality: –Creates an XML Parser object –Parses a document specified on the command line –Receives SAX events and prints these to the console. First, let’s examine a sample XML document. Then view the output when this document is parsed.

11 Sample XML Document <!DOCTYPE DASDNA SYSTEM ' > taatttctcccattttgtaggttatcacttcactctgttgactttcttttg taatgcaactaaatccaggcgaagcatttcagcttaaccccgagacttttg Document contains two sequences of DNA.

12 Start Document Start Element: DASDNA Start Element: SEQUENCE Start Element: DNA Characters: taatttctcccattttgtaggttatcacttcactctgttgactttcttttg Characters: End Element: DNA End Element: SEQUENCE Start Element: SEQUENCE Start Element: DNA Characters: taatgcaactaaatccaggcgaagcatttcagcttaaccccgagacttttg Characters: End Element: DNA End Element: SEQUENCE End Element: DASDNA End Document Sample Output

13 package com.oreilly.bioxml.sax; import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.XMLReaderFactory; import java.io.IOException; /** * Basic SAX Example. * Illustrates basic implementation of the SAX Content Handler. */ public class SAXBasic implements ContentHandler { public void startDocument() throws SAXException { System.out.println("Start Document"); }

14 public void characters(char[] ch, int start, int length) throws SAXException { String str = new String(ch, start, length); System.out.println("Characters: " + str); } public void endDocument() throws SAXException { System.out.println("End Document"); } public void endElement(String namespaceURI, String localName, String qName) throws SAXException { System.out.println("End Element: " + localName); } public void endPrefixMapping(String prefix) throws SAXException { // No-op }

15 public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException { // No-op } public void processingInstruction(java.lang.String target, java.lang.String data) throws SAXException { // No-op } public void setDocumentLocator(Locator locator) { // No-op } public void skippedEntity(String name) throws SAXException { // No-op } public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { System.out.println("Start Element: " + localName); }

16 public void startPrefixMapping(String prefix, String uri) throws SAXException { // No-op } /** * Prints Command Line Usage */ private static void printUsage() { System.out.println ("usage: SAXBasic xml-file"); System.exit(0); } /** * Main Method * Options for instantiating XMLReader Implementation: * 1) XMLReader parser = XMLReaderFactory.createXMLReader(); * 2) XMLReader parser = XMLReaderFactory.createXMLReader * ("org.apache.xerces.parsers.SAXParser"); * 3) XMLReader parser = new org.apache.xerces.parsers.SAXParser(); */

17 public static void main(String[] args) { if (args.length != 1) { printUsage(); } try { SAXBasic saxHandler = new SAXBasic(); XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser"); parser.setContentHandler(saxHandler); parser.parse(args[0]); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }

18 Main SAX Interfaces SAX provides two main interfaces: –XMLReader: implemented by the XML parser. –ContentHandler: implemented by your application in order to receive SAX events. Each time an event occurs, e.g. start element, end element, the XML parser calls the ContentHandler and informs you of the specific event.

19 XMLReader Interface You have three main options for instantiating an XMLReader class. Option 1: Use the SAX XMLReaderFactory class with no arguments: XMLReader parser = XMLReaderFactory.createXMLReader(); The factory will attempt to instantiate an XMLReader based on system defaults.

20 Option 1: Continued You can specify a system property from the java command line via the -D option. For example, the following line invokes the SAXBasic class and specifies the Xerces2 XML Parser: javaw.exe - Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser com.oreilly.bioxml.sax.SAXBasic sample.xml The advantage of using system properties is that you can dynamically change parsers at any time without recompiling any code. If the Factory is unable to determine any valid system defaults, it will throw a SAXException, with a specific message: "System property org.xml.sax.driver not specified."

21 Using Different Parsers The specific class the implements the XMLReader interface varies from parser to parser. For example: For the Xerces XML Parser, it's org.apache.xerces.parser.SAXParser. For the Crimson XML Parser, it's org.apache.crimson.parser.XMLReaderImpl.

22 Option 2 Call the XMLReaderFactory with a String argument indicating the class name that implements the XMLReader interface: For example: XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser");

23 Option 3 Instantiate the XMLReader implementation directly: For example: XMLReader parser = new org.apache.xerces.parsers.SAXParser(); This option works fine. However, note that if you switch parsers, you will need to recompile.

24 XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser"); parser.parse(“simple.xml”); Using an XMLReader Once you have an XMLReader class, you can call the parse() method to start parsing: You can pass a local file name or an absolute URL to the parse() method.

25 ContentHandler Interface The ContentHandler receives all SAX events. In total, there are 11 defined events. The most important events/methods are defined below: charactersReceive notification of character data. endDocumentReceive notification of the end of a document. endElementReceive notification of the end of an element. Continued…

26 Content Handler API (cont) ignorableWhitespace Receive notification of ignorable whitespace in element content. setDocumentLocator Receive an object for locating the origin of SAX document events. startDocumentReceive notification of the beginning of a document. startElementReceive notification of the beginning of an element.

27 Character “Chunking” Suppose you have the following piece of XML: taatgcaactaaatccaggcgaagcatttcagcttaaccccg You will receive a start element event, followed by one or more character events. Parsers are free to call the characters() method any way they want. For example, one parse might do the following: –characters (“t”); –characters (“a”); Another parser might do this: –characters (“taatgcaactaaatccagg”); –characters (“cgaagcatttcagcttaaccccg”);

28 Character Chunking Your application needs to be able to handle either of these strategies. To do this, it is best to store character data in some kind of buffer, like StringBuffer. For example: /** * Processes Character Events via Buffer */ public void characters(char[] ch, int start, int length) throws SAXException { String str = new String(ch, start, length); currentText.append(str); }

29 Using ContentHandlers To receive events, you must: –Implement the ContentHandler interface –Register your content handler with the XML parser: XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser"); parser.setContentHandler(saxHandler); parser.parse(args[0]);

30 ContentHandler Implementation Here’s a sample implementation that just outputs information about each event: public void characters(char[] ch, int start, int length) throws SAXException { String str = new String(ch, start, length); System.out.println("Characters: " + str); } public void endElement(String namespaceURI, String localName, String qName) throws SAXException { System.out.println("End Element: " + localName); }

31 Using the SAX Default Handler

32 SAX Default Handler In total, an implementation of ContentHandler must implement 11 methods. You usually don’t need to intercept all 11 of these events. It is therefore much easier to extend the SAX DefaultHandler. The DefaultHandler provides no-op implementations of all methods. You can therefore simply override those that you want. The next few slides provides an example.

33 package com.oreilly.bioxml.sax; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.XMLReader; import java.io.IOException; /** * Basic SAX Example. * Illustrates extending of DefaultHandler */ public class SAXDefaultHandler extends DefaultHandler { public void startDocument() throws SAXException { System.out.println("Start Document"); }

34 public void characters(char[] ch, int start, int length) throws SAXException { String str = new String(ch, start, length); System.out.println("Characters: " + str); } public void endDocument() throws SAXException { System.out.println("End Document"); } public void endElement(String namespaceURI, String localName, String qName) throws SAXException { System.out.println("End Element: " + localName); } public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { System.out.println("Start Element: " + localName); } Only override those methods that you need.

35 /** * Prints Command Line Usage */ private static void printUsage() { System.out.println ("usage: SAXDefaultHandler xml-file"); System.exit(0); } /** * Main Method */ public static void main(String[] args) { if (args.length != 1) { printUsage(); } try { SAXDefaultHandler saxHandler = new SAXDefaultHandler(); XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser");

36 parser.setContentHandler(saxHandler); parser.parse(args[0]); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } By extending the Default Handler, your code is much more compact and concise. The output of this program is identical to the first example.

37 Checking for Well- Formedness

38 Defaults By default, the Xerces XML parser (and most other parsers) will check for well- formedness, but they will not automatically check for validity. Suppose we have the following document on the next page.

39 Sample Document: Not Well-formed <!DOCTYPE DASDNA SYSTEM ' > taatttctcccattttgtaggttatcacttcactctgttgactttcttttg taatgcaactaaatccaggcgaagcatttcagcttaaccccgagacttttg This document is not well-formed, because I deleted one of the end tags.

40 Sample Output Start Document Start Element: DASDNA Start Element: SEQUENCE Start Element: DNA Characters: taatttctcccattttgtaggttatcacttcactctgttgactttcttttg Characters: [Fatal Error] ensemble_dna_error.xml:8:5: The element type "DNA" must be terminated by the matching end-tag " ". org.xml.sax.SAXParseException: The element type "DNA" must be terminated by the matching end-tag " ". at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at com.oreilly.bioxml.sax.SAXBasic.main(SAXBasic.java:101) This is a fatal error. The parser therefore throws a SAXParseException.

41 Try / Catch Clause try { SAXDefaultHandler saxHandler = new SAXDefaultHandler(); XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser"); parser.setContentHandler(saxHandler); parser.parse(args[0]); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } Indicates a fatal parsing error, such as errors in well-formedness. Indicates an IO Error, such as failed network connection.

42 Summary SAX is a standard, event-based interface for parsing XML documents. It is a de facto standard, not an official W3C standard. XML Parsers must implement the XMLReader interface. Applications must implement the ContentHandler interface. For more concise programs, extend the SAX Default Handler. Make sure to surround calls to parse() with a try/catch clause.