SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

An Introduction to XML Based on the W3C XML Recommendations.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
XML Robert Grimm New York University. The Whirlwind So Far  HTTP  Persistent connections  (Style sheets)  Fast servers  Event driven architectures.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
XML DOM and SAX Parsers By Omar RABI. Introduction to parsers  The word parser comes from compilers  In a compiler, a parser is the module that reads.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
Java API for XML Processing (JAXP) CSE 4/586: Distributed Systems Department of Computer Science and Engineering University at Buffalo, New York Jia Zhao.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
CS 898N – Advanced World Wide Web Technologies Lecture 22: Applying XML Chin-Chih Chang
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Tutorial 11 Creating XML Document
XML Primer. 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
Introduction to XML Extensible Markup Language
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
4/20/2017.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XP New Perspectives on XML, 2 nd Edition Tutorial 10 1 WORKING WITH THE DOCUMENT OBJECT MODEL TUTORIAL 10.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
Session IV Chapter 9 – XML Schemas
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
CIS 451: XML DTDs Dr. Ralph D. Westfall February, 2009.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Scripting with the DOM Ellen Pearlman Eileen Mullin Programming the Web.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
School of Computing and Information Systems CS 371 Web Application Programming XML and JSON Encoding Data.
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
CO1552 – Web Application Development Further JavaScript: Part 1: The Document Object Model Part 2: Functions and Events.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
Introduction to Server-Side Web Development Introduction to Server-Side Web Development using JSP and XML Session V: Further JSP and integration with XML.
Java API for XML Processing
XML 1.Introduction to XML 2.Document Type Definition (DTD) 3.XML Parser 4.Example: CGI Gateway to XML Middleware.
X M L Extensible Markup Language It is a cross platform tool or a language to achieve data transfer between the cross platform. Note  1.XML is not a replacement.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java XML IS
Intro to XML.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
A parser for XML Documents
XML Parsers.
SAX2 29-Jul-19.
XML and Web Services (II/2546)
Presentation transcript:

SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design

SAX Parsing Introduction Review of XML Review of XML What is SAX parsing? What is SAX parsing? Simple Example program Simple Example program Compiler Design Issues Compiler Design Issues Demonstrated by a more complex example Demonstrated by a more complex example Wrap-up Wrap-up References References

Quick XML Review XML – Wave of the future XML – Wave of the future Method of representing data Method of representing data Differs from HTML by storing and representing data instead of displaying or formatting data Differs from HTML by storing and representing data instead of displaying or formatting data Tags similar to HTML tags, only they are user- defined Tags similar to HTML tags, only they are user- defined Follows a small set of basic rules Follows a small set of basic rules Stored as a simple ASCII text file, so portability is insanely easy Stored as a simple ASCII text file, so portability is insanely easy

Quick XML Review Syntax Syntax Every XML document has a preamble Every XML document has a preamble An XML document may or may not have a DTD (Document Type Definition) or Schema An XML document may or may not have a DTD (Document Type Definition) or Schema

Quick XML Review Syntax cont. Syntax cont. Every element has a start and end tag, with optional attributes Every element has a start and end tag, with optional attributes … … If an element does not contain any data (or elements) nested within, the closing tag can be merged with the start tag like so: If an element does not contain any data (or elements) nested within, the closing tag can be merged with the start tag like so:

Quick XML Review Syntax cont. Syntax cont. Elements must be properly nested Elements must be properly nested The outermost element is called the root element The outermost element is called the root element An XML document that follows the basic syntax rules is called well-formed An XML document that follows the basic syntax rules is called well-formed An XML document that is well-formed and conforms to a DTD or Schema is called valid An XML document that is well-formed and conforms to a DTD or Schema is called valid Once again, XML documents do not always require a DTD or Schema, but they must be well-formed Once again, XML documents do not always require a DTD or Schema, but they must be well-formed

Quick XML Review Sample XML files Sample XML files Catalog.xml Catalog.xml authorSimple.xml authorSimple.xml authorSimpleError.xml authorSimpleError.xml

What is SAX Parsing? Simple API for XML = SAX Simple API for XML = SAX SAX is an event-based parsing method SAX is an event-based parsing method We are all familiar with event-driven software, whether we know it or not We are all familiar with event-driven software, whether we know it or not Pop-up windows, pull-down menus, etc. Pop-up windows, pull-down menus, etc. If a certain “event” (or action) happens, do something If a certain “event” (or action) happens, do something A SAX parser reads an XML document, firing (or calling) callback methods when certain events are found (e.g. elements, attributes, start/end tags, etc.) A SAX parser reads an XML document, firing (or calling) callback methods when certain events are found (e.g. elements, attributes, start/end tags, etc.)

What is SAX Parsing? Benefits of SAX parsing Benefits of SAX parsing Unlike DOM (Document Object Model), SAX does not store information in an internal tree structure Unlike DOM (Document Object Model), SAX does not store information in an internal tree structure Because of this, SAX is able to parse huge documents (think gigabytes) without having to allocate large amounts of system resources Because of this, SAX is able to parse huge documents (think gigabytes) without having to allocate large amounts of system resources Really great if the amount of data you’re looking to store is relatively small (no waste of memory on tree) Really great if the amount of data you’re looking to store is relatively small (no waste of memory on tree) If processing is built as a pipeline, you don’t have to wait for the data to be converted to an object; you can go to the next process once it clears the preceding callback method If processing is built as a pipeline, you don’t have to wait for the data to be converted to an object; you can go to the next process once it clears the preceding callback method

What is SAX Parsing? Downside Downside Most limitations are the programmer’s problem, not the API’s Most limitations are the programmer’s problem, not the API’s SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes Makes it hard to implement cross-referencing in XML (ID and IDREF) as well as complex searching routines Makes it hard to implement cross-referencing in XML (ID and IDREF) as well as complex searching routines

What is SAX Parsing? Callback Methods Callback Methods The SAX API has a default handler class built in so you don’t have to re-implement the interfaces every time ( org.xml.sax.helpers.DefaultHandler ) The SAX API has a default handler class built in so you don’t have to re-implement the interfaces every time ( org.xml.sax.helpers.DefaultHandler ) The five most common methods to override are: The five most common methods to override are: startElement(String uri, String lname, String qname, Attributes atts) startElement(String uri, String lname, String qname, Attributes atts) endDocument(String uri, String lname, String qname) endDocument(String uri, String lname, String qname) characters(char text[], int start, int length) characters(char text[], int start, int length) startDocument() startDocument() endDocument() endDocument()

Simple Example Program Sax.java Sax.java Instantiates a SAX parser and creates a default handler for the parser Instantiates a SAX parser and creates a default handler for the parser Reads in an XML document and echoes the structure to the standard out Reads in an XML document and echoes the structure to the standard out Two sample XML documents: Two sample XML documents: authorSimple.xml authorSimple.xml authorSimpleError.xml authorSimpleError.xml Demonstration here Demonstration here

Compiler Design Issues What is actually happening when a SAX parser parses an XML document? What is actually happening when a SAX parser parses an XML document? What type of internal data structures does it use? What type of internal data structures does it use? How do the callback methods fit in? How do the callback methods fit in? Can it solve problems of world peace, hunger, and death? (Or at least can it help me pass Compiler Design?) Can it solve problems of world peace, hunger, and death? (Or at least can it help me pass Compiler Design?) Demonstrated with SaxCatalogUnmarshaller example Demonstrated with SaxCatalogUnmarshaller example

Compiler Design Issues Heart of the Beast Heart of the Beast Underneath it all, the SAX parser uses a stack Underneath it all, the SAX parser uses a stack Whenever an element is started, a new data object is pushed onto the stack Whenever an element is started, a new data object is pushed onto the stack Later, when the element is closed, the topmost object on the stack is finished and can be popped Later, when the element is closed, the topmost object on the stack is finished and can be popped Unless it is the root element, the popped element will have been a child element of the object that now occupies the top of the stack (board) Unless it is the root element, the popped element will have been a child element of the object that now occupies the top of the stack (board)

Compiler Design Issues Heart of the Beast cont. Heart of the Beast cont. This process corresponds to the shift-reduce cycle of bottom-up parsers This process corresponds to the shift-reduce cycle of bottom-up parsers It is crucial that XML elements be well-formed and properly nested for this to work It is crucial that XML elements be well-formed and properly nested for this to work

Compiler Design Issues startElement() startElement() Four parameters: Four parameters: String uri = the namespace URI (Uniform Resource Identifier) String uri = the namespace URI (Uniform Resource Identifier) String lname = the local name of the element String lname = the local name of the element String qname = the qualified name of the element String qname = the qualified name of the element Attributes atts = list of attributes for this element Attributes atts = list of attributes for this element If the current element is a complex element, an object of the appropriate type is created and pushed on to the stack If the current element is a complex element, an object of the appropriate type is created and pushed on to the stack If the element is simple, a StringBuffer is pushed on to the stack, ready to accept character data If the element is simple, a StringBuffer is pushed on to the stack, ready to accept character data

Compiler Design Issues endElement() endElement() Three parameters: Three parameters: String uri = the namespace URI (Uniform Resource Identifier) String uri = the namespace URI (Uniform Resource Identifier) String lname = the local name of the element String lname = the local name of the element String qname = the qualified name of the element String qname = the qualified name of the element The topmost element on the stack is popped, converted to the proper type, and inserted into its parent, which now occupies the top of the stack (unless this is the root element – special handling required) The topmost element on the stack is popped, converted to the proper type, and inserted into its parent, which now occupies the top of the stack (unless this is the root element – special handling required)

Compiler Design Issues characters() characters() Three parameters: Three parameters: char text[] = character array containing the entire XML document char text[] = character array containing the entire XML document int start = starting index of current data in text[] int start = starting index of current data in text[] int length = ending index of current data in text[] int length = ending index of current data in text[] When the parser encounters raw text, it passes a char array containing the actual data, the starting position, and the length of data to be read from the array When the parser encounters raw text, it passes a char array containing the actual data, the starting position, and the length of data to be read from the array

Compiler Design Issues characters() cont. characters() cont. The implementation of the callback method inserts the data into the StringBuffer located on the top of the stack The implementation of the callback method inserts the data into the StringBuffer located on the top of the stack Can lead to confusion because of: Can lead to confusion because of: No guarantee that a single stretch of characters results in one call to characters() No guarantee that a single stretch of characters results in one call to characters() It stores all characters, including whitespace, encountered by the parser It stores all characters, including whitespace, encountered by the parser

Wrap-up SAX is an event-based parser, using callback methods to handle events found by the parser SAX is an event-based parser, using callback methods to handle events found by the parser Applications are written by extending the DefaultHandler class and overriding the event handler methods Applications are written by extending the DefaultHandler class and overriding the event handler methods The SAX parser usually uses a stack to perform operations The SAX parser usually uses a stack to perform operations And No, SAX will not save the world… And No, SAX will not save the world…

References Gittleman, Art. Advanced Java: Internet Applications (Second Edition). Scott Jones Publishers. El Granada, California pp Janert, Phillip K. “Simple XML Parsing with SAX and DOM.” Published June 26, Accessed February 10, Wati, Anjini. “E-Catalog for a Small to Medium Enterprise.” Accessed February 10, 2003.