SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
SDPL 2002Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
SDPL 2003Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
XML DOM and SAX Parsers By Omar RABI. Introduction to parsers  The word parser comes from compilers  In a compiler, a parser is the module that reads.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
SDPL : (XML APIs) JAXP1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –In Java: through JAXP –An overview of.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
The Joy of SAX (and DOM, and JDOM…) Bill MacCartney 11 October 2004.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Structured-Document Processing Languages Spring 2011 Course Review Repetitio mater studiorum est!
XML Data Processing and Transformation ดร. มารุต บูรณรัช : หัวข้อพิเศษด้านเทคโนโลยีสารสนเทศขั้นสูง.
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
SDPL 2004Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
3/29/2001 O'Reilly Java Java API for XML Processing 1.1 What’s New Edwin Goei Engineer, Sun Microsystems.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
Introduction to Server-Side Web Development Introduction to Server-Side Web Development using JSP and XML Further JSP and integration with XML 17 th March.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
Simple API for XML
Parsing with SAX using Java Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java XML IS
XML Parsers By Chongbing Liu.
Jagdish Gangolly State University of New York at Albany
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
A parser for XML Documents
XML Programming in Java
SAX2 29-Jul-19.
Presentation transcript:

SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor interfaces 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing 3.4 StAX: Streaming API for XML

SDPL 20113: XML APIs and SAX2 Document Parser Interfaces n Each XML application contains a parser –editors, browsers –transformation/style engines, DB loaders,... n XML parsers have become standard tools of application development frameworks –Standard Java (since JDK 1.4) provides default parsers (built on Apache Xerces) via JAXP (See, e.g., Leventhal, Lewis & Fuchs: Designing XML Internet Applications, Chapter 10, and D. Megginson: Events vs. Trees) Events vs. TreesEvents vs. Trees

SDPL 20113: XML APIs and SAX3 Tasks of a Parser n Document instance decomposition –into "XML Information Set" of elements, attributes, text, processing instructions, entities,... n Verification –well-formedness checking »syntactical correctness of XML markup –validation (against a DTD or Schema) n Access to contents of the DTD (if supported) –SAX 2.0 Extensions provide info of declarations: element type names and their content model expressions

SDPL 20113: XML APIs and SAX4 Document Parser Interfaces I: Event-based (streaming) interfaces –Command line and ESIS interfaces »Element Structure Information Set, traditional interface to stand-alone SGML parsers –Event call-back (or Push) interfaces: SAX –Pull interfaces: StAX II: Tree-based (object model) interfaces –W3C DOM Recommendation –Java-specific object models: JAXB, JDOM, dom4J

SDPL 20113: XML APIs and SAX5 Command-line ESIS interface Application SGML/XML Parser Command line call <E</E>Hi!i="1"> ESISStream (E Ai CDATA 1 -Hi! )E "Loosecoupling"

SDPL 20113: XML APIs and SAX6 Event Call-Back (Push) Interfaces n Application implements a set of call-back methods for acting on parse events –"Observer design pattern" –parameters qualify events further, with »element type name »names and values of attributes »values of content strings, … n Idea behind ‘‘SAX’’ (Simple API for XML) –an industry standard API for XML parsers –kind-of “Serial Access XML”

SDPL 20113: XML APIs and SAX7 Event Call-Back Application Application Main Routine startDocument() startElement(... ) characters(... ) XMLReader.parse(... ) Callback Routines endElement(... ) </A>Hi! "A",[i="1"] "Hi!" "A"

SDPL 20113: XML APIs and SAX8 Pull Parsing Interfaces n The parser provides document contents as a stream of events, which the application can iterate over –the application actively "pulls" the events »in push processing, the application reacts on them –leads often to simpler code –allows to cancel the parsing more easily, or parse multiple documents simultaneously n Idea behind ‘‘StAX’’ (Streaming API for XML) –a standard Java API for XML parsers

SDPL 20113: XML APIs and SAX A Pull-Parsing Application ApplicationEventReader.nextEvent() Parser API StartDocument Hi! Characters"Hi!" </A> EndElement"A" StartElement "A",[i="1"] 9

SDPL 20113: XML APIs and SAX10 Tree-based / Object Model Interfaces n Application interacts with –a parser object, which builds... –a document object consisting of document, elements, attributes, text, … n Abstraction level higher than in event based interfaces; more powerful access –to descendants, following siblings, … n Drawback: Higher memory consumption –> used mainly in client applications (to implement document manipulation by user)

SDPL 20113: XML APIs and SAX11 DOM-Based Application Conceptually Application ParserObject In-Memory Document Representation Parse Access/Modify Build Document i=1A "Hi!" </A>Hi!

SDPL 20113: XML APIs and SAX The SAX Event Callback API n A de-facto industry standard –Developed by members of the xml-dev mailing list –Version 1.0 in May 1998, Vers. 2.0 in May 2000 –Not a parser, but a common interface for different parsers (like, say, JDBC is a common interface to various RDBs) n Supported directly by major XML parsers –many Java based, and free; Examples: Apache Xerces, Oracle's XML Parser for Java; MSXML (in IE 5+), James Clark's XP

SDPL 20113: XML APIs and SAX13 SAX 2.0 Interfaces n Co-operation of an application and a parser specified in terms of interfaces (i.e., collections of methods) n My classification of SAX interfaces: –Application-to-parser interfaces »to use the parser –Parser-to-application (or call-back) interfaces »to act on various parsing events –Auxiliary interfaces »to manipulate parser-provided information

SDPL 20113: XML APIs and SAX14 Application-to-Parser Interfaces n Implemented by parser (or a SAX driver): –XMLReader »methods to invoke the parser, and to register objects that implement call-back interfaces –XMLFilter (extends XMLReader) »interface to connect XMLReader s in a row as a sequence of filters »obtains events from an XMLReader and passes them further (possibly modified)

SDPL 20113: XML APIs and SAX15 Call-Back Interfaces Implemented by application to act on parse events (A DefaultHandler quietly ignores most of them) Implemented by application to act on parse events (A DefaultHandler quietly ignores most of them) –ContentHandler »methods to process document parsing events –DTDHandler »methods to receive notification of unparsed external entities and notations declared in the DTD –ErrorHandler »methods for handling parsing errors and warnings –EntityResolver »methods for customised processing of external entity references

SDPL 20113: XML APIs and SAX16 SAX 2.0: Auxiliary Interfaces Attributes Attributes –methods to access a list of attributes, e.g: int getLength() String getValue(String attrName) Locator Locator –methods for locating the origin of parse events: getSystemID(), getPublicID(), getLineNumber(), getColumnNumber() –for example, to report application-specific semantic errors

SDPL 20113: XML APIs and SAX17 The ContentHandler Interface Information of general document events. (See API documentation for a complete list): Information of general document events. (See API documentation for a complete list): setDocumentLocator(Locator locator) setDocumentLocator(Locator locator) –to receive a locator for the origin of SAX document events startDocument();endDocument() startDocument(); endDocument() –notify the beginning/end of a document. startElement(String nsURI, String localName, String rawName, Attributes atts); endElement( … ); as above but without attributes startElement(String nsURI, String localName, String rawName, Attributes atts); endElement( … ); as above but without attributes for namespace support

SDPL 20113: XML APIs and SAX18 Namespaces in SAX: Example </xsl:stylesheet> startElement for this would pass following parameters: startElement for this would pass following parameters: –nsURI= –localname = template, rawName = xsl:template

SDPL 20113: XML APIs and SAX19 Namespaces: Example (2) </xsl:stylesheet> endElement for html would give endElement for html would give –nsURI = (as default namespace for element names without a prefix), localname = html, rawName = html

SDPL 20113: XML APIs and SAX20 ]> ]><A><B></B></A> ContentHandler interface (cont.) characters(char ch[], int start, int length) characters(char ch[], int start, int length) –notification of character data ignorableWhitespace(char ch[], int start, int length) ignorableWhitespace(char ch[], int start, int length) –notification of ignorable whitespace in element content Ignorable whitespace Text content

SDPL 20113: XML APIs and SAX21 SAX Processing Example (1/9) n Input: XML representation of a personnel database: Kilpeläinen Pekka Kilpeläinen Pekka <last>Möttönen</last><first>Matti</first></person> Möttönen Maija Möttönen Maija Römppänen Maija Römppänen Maija </db>

SDPL 20113: XML APIs and SAX22 SAX Processing Example (2/9) n Task: Format the document as a list like this: Pekka Kilpeläinen (1234) Matti Möttönen (5678) Maija Möttönen (9012) Maija Römppänen (3456) n Event-based processing strategy: –at the start of person, record the idnum (e.g., 1234 ) –record starts and ends of last and first to store their contents (e.g., " Kilpeläinen " and " Pekka ") –at the end of a person, output the collected data

SDPL 20113: XML APIs and SAX23 SAX Processing Example (3/9) n Application: First import relevant interfaces & classes: import org.xml.sax.XMLReader; import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; //Default (no-op) implementation of //interface ContentHandler: import org.xml.sax.helpers.DefaultHandler; // JAXP to instantiate a parser: import javax.xml.parsers.*;

SDPL 20113: XML APIs and SAX24 SAX Processing Example (4/9) n Implement relevant call-back methods: public class SAXDBApp extends DefaultHandler{ // Flags to remember element context: private boolean InFirst = false, InLast = false; private boolean InFirst = false, InLast = false; // Storage for element contents and // attribute values: private String FirstName, LastName, IdNum; private String FirstName, LastName, IdNum;

SDPL 20113: XML APIs and SAX25 SAX Processing Example (5/9) n Call-back methods: –record the start of first and last elements, and the idnum attribute of a person : public void startElement ( String nsURI, String localName, String rawName, Attributes atts) { String rawName, Attributes atts) { if (rawName.equals("person")) if (rawName.equals("person")) IdNum = atts.getValue("idnum"); if (rawName.equals("first")) InFirst = true; if (rawName.equals("last")) if (rawName.equals("last")) InLast = true; } // startElement } // startElement

SDPL 20113: XML APIs and SAX26 SAX Processing Example (6/9) n Call-back methods continue: –Record the text content of elements first and last : public void characters ( char buf[], int start, int length) { public void characters ( char buf[], int start, int length) { if (InFirst) FirstName = new String(buf, start, length); if (InFirst) FirstName = new String(buf, start, length); if (InLast) LastName = new String(buf, start, length); if (InLast) LastName = new String(buf, start, length); } // characters } // characters

SDPL 20113: XML APIs and SAX27 SAX Processing Example (7/9) At the end of person, output the collected data: At the end of person, output the collected data: public void endElement(String nsURI, String localName, String qName) { if (qName.equals("person")) { if (qName.equals("person")) { System.out.println(FirstName + " " + LastName + " (" + IdNum + ")" ); InFirst = false; } System.out.println(FirstName + " " + LastName + " (" + IdNum + ")" ); InFirst = false; } //Update context flag: if (qName.equals("last")) InLast = false; if (qName.equals("last")) InLast = false; } // endElement()

SDPL 20113: XML APIs and SAX28 SAX Processing Example (8/9) Application main method: Application main method: public static void main (String args[]) { // Instantiate an XMLReader (from JAXP // SAXParserFactory): SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParserFactory spf = SAXParserFactory.newInstance(); try { try { SAXParser saxParser = spf.newSAXParser(); SAXParser saxParser = spf.newSAXParser(); XMLReader xmlReader = saxParser.getXMLReader(); XMLReader xmlReader = saxParser.getXMLReader();

SDPL 20113: XML APIs and SAX29 SAX Processing Example (9/9) Main method continues: Main method continues: // Instantiate and pass a new // ContentHandler to xmlReader: ContentHandler handler = new SAXDBApp(); ContentHandler handler = new SAXDBApp(); xmlReader.setContentHandler(handler); xmlReader.setContentHandler(handler); for (int i = 0; i < args.length; i++) { for (int i = 0; i < args.length; i++) { xmlReader.parse(args[i]); xmlReader.parse(args[i]); } } catch (Exception e) { System.err.println(e.getMessage()); System.exit(1); } System.exit(1); } } // main

SDPL 20113: XML APIs and SAX30 SAX: Summary n A low-level parser-interface for XML documents n Reports document parsing events through method call-backs –> efficient: does not create in-memory representation of the document –> used often on servers and on resource-limited devices (palm-tops), and to process LARGE documents –No limit on document size!