Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.

Similar presentations


Presentation on theme: "SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations."— Presentation transcript:

1 SAX

2 What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations in most languages The current version is SAX 2.0.1, and there are versions for several programming language environments other than Java

3 How does SAX work An XML document is seen as a series of “events” Unlike DOM, SAX does not store information in an internal tree structure SAX is able to parse huge documents (think gigabytes) without having to allocate large amounts of system resources If processing is built as a pipeline, it doesn ’ t have to wait for the data to be converted to an object; it can go to the next process once it clears the preceding callback method SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes

4 SAX Structure(1/4)

5 SAX Structure(2/4) SAXParserFactory:A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory. SAXParser:The SAXParser interface defines several kinds of parse() methods. In general, it passes an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object. SAXReader:The SAXParser wraps a SAXReader. Typically, it doesn't care about that, but every once in a while it needs to get hold of it using SAXParser's getXMLReader() so that it can configure it. It is the SAXReader that carries on the conversation with the SAX event handlers it defines.

6 SAX Structure(3/4) DefaultHandler:Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so it can override only the ones it is interested in. ContentHandler:Methods such as startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines the methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively. EntityResolver:The resolve Entity method is invoked when the parser must identify data identified by a URI

7 SAX Structure(4/4) ErrorHandler:Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser. DTDHandler:Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.

8 SAX Event startDocument endDocument startElement endElement characters

9 Pull Parsing Versus Push Parsing Streaming pull parsing refers to a programming model in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset--that is, the client only gets (pulls) XML data when it explicitly asks for it. Streaming push parsing refers to a programming model in which an XML parser sends (pushes) XML data to the client as the parser encounters elements in an XML infoset--that is, the parser sends the data whether or not the client is ready to use it at that time.

10 XML Parser API Feature Summary FeatureStAXSAXDOM API Type Pull,streaming Push,streaming In memory tree Ease of Use HighMediumHigh XPathCapability No Yes CPU and MemoryEfficiency Good Varies Forward Only Yes No Read XML Yes Write XML YesNoYes Create, Read, Update, Delete No Yes

11 XML Parser and APIs supporting SAX Xerces  Xerces is a family of software packages for parsing and manipulating XML, part of the Apache XML project MSXML  Microsoft XML Core Services (MSXML) is a set of services that allow applications written in JScript, VBScript and Microsoft Visual Studio 6.0 to build XML-based applications Crimson XML JAXP: Java API for XML Processing  The Java API for XML Processing, or JAXP, is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents

12 SAX Example

13 public class MySAXApp extends DefaultHandler { XMLReader xr = XMLReaderFactory.createXMLReader(); MySAXApp handler = new MySAXApp(); xr.setContentHandler(handler); xr.setErrorHandler(handler); FileReader r = new FileReader(file); xr.parse(new InputSource(r)); //////////////////////////////////////////////////////////////////// // Event handlers. //////////////////////////////////////////////////////////////////// }

14 public void startDocument () { // TODO: add customized code here } public void endDocument () { // TODO: add customized code here } public void startElement ( String uri, String name, String qName, Attributes atts ) { // TODO: add customized code here } public void endElement ( String uri, String name, String qName ) { // TODO: add customized code here }

15 Applications of XML Stream Processing content-based XML routing selective dissemination of information continuous queries processing of scientific data stored in large XML files

16 Selective Dissemination of Information The use of selective approaches to dissemination in order to avoid users with unnecessary information. Applications :  stock and sports tickers  traffic information systems  electronic personalized newspapers  entertainment delivery

17 Typical SDI Systems Representation of user profiles  simple keyword matching  “bag of words” Information Retrieval (IR) techniques Limited ability Inefficiency of filtering

18 Selective Dissemination of Information

19 References M. Altinel, M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In VLDB Conf., Sep. 2000. Y. Diao, P. Fischer, M. Franklin, and R. To. Yfilter: Efficient and scalable Filtering of XML documents. In Proceedings of the International Conference on Data Engineering, San Jose, California, February 2002.


Download ppt "SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations."

Similar presentations


Ads by Google