20 November 2002ApacheCon US - Las Vegas, Nevada 1 Xerces2: The Sequel With No Equal Andy Clark.

Slides:



Advertisements
Similar presentations
Open Office.Org What is the Open Office.org Source Project? Open source project through which Sun Microsystems is releasing the technology for the popular.
Advertisements

1 XML: Advanced Guide Holly A. Hyland, FSA Andrew Smalera, XML Framework Session 14.
JAXB Java Architecture for XML Binding Andy Fanton Khalid AlBayat.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
XML Robert Grimm New York University. The Whirlwind So Far  HTTP  Persistent connections  (Style sheets)  Fast servers  Event driven architectures.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
XML Parser. Why Need a XML Parser ? Check XML syntax. ( is well-formed ? ) Validation. ( DTD and XML Schema ) Allow programmatic access to the document’s.
CS 898N – Advanced World Wide Web Technologies Lecture 22: Applying XML Chin-Chih Chang
Introduction to XML Extensible Markup Language
Web Services with Apache CXF Part 2: JAXB and WSDL to Java Robert Thornton.
PHP and XML TP2653 Advance Web Programming. PHP and XML PHP5 – XML-based extensions, library and functionalities (current XAMPP PHP version is )
Struts 2.0 an Overview ( )
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
XML and its applications: 4. Processing XML using PHP.
ASP.NET and XML Presented By: Shravan S. Mylavarapu 1.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Washington Area SGML/XML Users Group – 21 June 2000 BeOpen.com 1 Python, XML, and PythonLabs Fred L. Drake, Jr.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
© 2005 by IBM; made available under the EPL v1.0 | June 9, 2005 David Williams WTP Source Editing Open House.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
17 Apr 2002 XML Syntax: Documents Andy Clark. Basic Document Structure Element tags – Elements have associated attributes Text content Miscellaneous –
METS Dissemination METS Opening Day Corey Keith
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
XML Refresher Course Bálint Joó School of Physics University of Edinburgh May 02, 2003.
XML Grammar and Parser for WSOL Kruti Patel, Vladimir Tosic, Bernard Pagurek Network Management & Artificial Intelligence Lab Department of Systems & Computer.
XML Study-Session: Part III
WIRED Detector Description in XML Mark Dönszelmann, Applications for Physics and Infrastructure, IT, CERN XML Detector Description Workshop CERN, 14 April,
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
COSC617 Project XML Tools Mark Liu Sanjay Srivastava Junping Zhang.
1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.
Martin Kruliš by Martin Kruliš (v1.1)1.
XML for Scientific Applications Marlon Pierce ERDC Tutorial August
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
XML Extensible Markup Language
Java API for XML Processing
Lecture Transforming Data: Using Apache Xalan to apply XSLT transformations Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
Simple API for XML
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
XML in Web Technologies
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Database Processing with XML
XML Technologies and Related Applications
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
XML Problems and Solutions
A parser for XML Documents
XML Parsers.
XML Programming in Java
XML and Web Services (II/2546)
Presentation transcript:

20 November 2002ApacheCon US - Las Vegas, Nevada 1 Xerces2: The Sequel With No Equal Andy Clark

ApacheCon US - Las Vegas, Nevada2 20 November 2002 Introduction Speaker  Worked for IBM  Currently unemployed Parser  First developed in IBM’s Tokyo research lab  Maintained and expanded in California  Donated to Apache  Work continues in Toronto

ApacheCon US - Las Vegas, Nevada3 20 November 2002 Agenda Xerces1 Overview  Design and problems Xerces2 Overview  Challenges and design Q & A

20 November 2002ApacheCon US - Las Vegas, Nevada 4 Xerces1 Overview: Design and Problems Andy Clark

ApacheCon US - Las Vegas, Nevada5 20 November 2002 Design XML4J/Xerces1 designed for performance Parser Implementation  Parsing pipeline  Custom reader implementations  StringPool Defers transcoding of byte buffers until needed Symbol table for common document strings

ApacheCon US - Las Vegas, Nevada6 20 November 2002 ScannerValidatorParser Intended to be generic XML API Pipeline Configuration

ApacheCon US - Las Vegas, Nevada7 20 November 2002 ScannerValidatorParser Pipeline Configuration Problems Hard-coded dependencies on implementation Inconsistent Interfaces XML API

ApacheCon US - Las Vegas, Nevada8 20 November 2002 Custom Readers Scanner Entity Handler Reader Stack UTF-8 Reader UCS Reader EBCDIC Reader Generic Reader scanName scanAttValue scanContent …

ApacheCon US - Las Vegas, Nevada9 20 November 2002 Custom Readers Problems Duplicated code  Allows more bugs to appear  Bugs are different based on encoding because code is not shared More complicated

ApacheCon US - Las Vegas, Nevada10 20 November 2002 Deferred Transcoding XML StringPool Parser Component String Producer Reader Data Buffer Data Buffer … addString(String):int toString(int):String addString(StringProducer,int,int):int

ApacheCon US - Las Vegas, Nevada11 20 November 2002 Deferred Transcoding Problems All components need reference to StringPool  Strings not immediately available to methods  Must make call to StringPool to query String Memory management is complicated  Responsibility of callee to free resources  Uses more memory

20 November 2002ApacheCon US - Las Vegas, Nevada 12 Xerces2 Overview: Challenges and Design Andy Clark

ApacheCon US - Las Vegas, Nevada13 20 November 2002 Challenges Requirements  Simple design and implementation  Easy to maintain  More modularity and configurability  Support current and future features Design Decisions  Always transcode bytes into Unicode characters Removes StringPool and dependencies  Clean architecture

ApacheCon US - Las Vegas, Nevada14 20 November 2002 Xerces Native Interface (XNI) “Streaming” Information Set  Similar to SAX  No loss of document information* Parser configuration and layering Future extensions  Native pull-parser, tree model, etc. * Does not preserve all document information but communicates more information to the application than DOM or SAX.

ApacheCon US - Las Vegas, Nevada15 20 November 2002 org.apache.xerces.xniorg.apache.xerces.xni.parser XMLDTDHandler XMLDTDContentModelHandler XMLDocumentFragmentHandler XMLLocator XMLDocumentHandler NamespaceContext XMLAttributes Augmentations QNameXMLString XNIException RuntimeException XMLPullParserConfiguration XMLErrorHandlerXMLEntityResolver XMLDTDScanner XMLDocumentScanner XMLDTDContentModelSourceXMLDTDContentModelFilter XMLDTDSourceXMLDTDFilter XMLDocumentSourceXMLDocumentFilter XMLComponentManagerXMLComponent XMLConfigurationException XMLParseException XMLInputSource XMLParserConfiguration java.lang Interface Class Package Extends XMLResourceIdentifier

ApacheCon US - Las Vegas, Nevada16 20 November 2002 Parsing Pipeline Handlers communicate information between parser components ScannerValidatorParser XML API

ApacheCon US - Las Vegas, Nevada17 20 November 2002 Handler Overview XML API Document Scanner ValidatorParser DTD Scanner XMLDocumentHandler XMLDTDHandler XMLDTDContentModelHandler

ApacheCon US - Las Vegas, Nevada18 20 November 2002 Parser Layout Components and Manager Component Manager Symbol Table Grammar Pool Datatype Factory Regular Components ScannerValidator Entity Manager Error Reporter Configurable Components

ApacheCon US - Las Vegas, Nevada19 20 November 2002 Reader Management Entity Scanner Entity Manager Reader Stack scanName scanAttValue scanContent … UTF-8 Reader UCS Reader EBCDIC Reader Generic Reader

ApacheCon US - Las Vegas, Nevada20 20 November 2002 Parser Configuration Before * Parser pipeline is part of the document parser base class. * Required duplication to re-configure parser and still take advantage of API generator code. XML SAX ParserDOM Parser Document Parser ScannerValidator

ApacheCon US - Las Vegas, Nevada21 20 November 2002 Parser Configuration After * Parser pipeline and settings are specified in a separate parser configuration object. * Allows re-use of framework without rewriting existing code. SAX ParserDOM Parser Document Parser Parser Configuration ScannerValidator XML

ApacheCon US - Las Vegas, Nevada22 20 November 2002 API Generators Different APIs can be generated from same document parser XNI SAX ParserDOM Parser … Document Parser JavaBean Parser

ApacheCon US - Las Vegas, Nevada23 20 November 2002 Sample Parser Configuration #1 HTML parser  Available as NekoHTML download SAX ParserDOM Parser Document Parser HTML Parser Configuration HTML Scanner HTML Tag Balancer

ApacheCon US - Las Vegas, Nevada24 20 November 2002 Non-validating parser (for performance)  Available with Xerces download SAX ParserDOM Parser Document Parser Non-Validating Parser Configuration Scanner / Namespace Binder XML Sample Parser Configuration #2

ApacheCon US - Las Vegas, Nevada25 20 November 2002 Sample Parser Configuration #3 XInclude processing  Not yet implemented SAX ParserDOM Parser Document Parser XInclude Parser Configuration Scanner XML XInclude Validator

ApacheCon US - Las Vegas, Nevada26 20 November 2002 Sample Parser Configuration #4 Database result set converted to XML  Not yet implemented SAX ParserDOM Parser Document Parser Database Parser Configuration Database Query Validator DB

ApacheCon US - Las Vegas, Nevada27 20 November 2002 That’s All, Folks! Question and Answers  Any questions? Links   