Processing of structured documents Part 4. XML processing model zXML processor is used to read XML documents and provide access to their content and structure.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
1 XML and Data Management XML Processors Hachim Haddouti Al Akhawayn University SSE
XML DOM and SAX Parsers By Omar RABI. Introduction to parsers  The word parser comes from compilers  In a compiler, a parser is the module that reads.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
JAX- Java APIs for XML by J. Pearce. Some XML Standards Basic –SAX (sequential access parser) –DOM (random access parser) –XSL (XSLT, XPATH) –DTD Schema.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
SE 5145 – eXtensible Markup Language (XML ) DOM (Document Object Model) (Part I) /Spring, Bahçeşehir University, Istanbul.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
PHP with XML Dequan Chen and Narith Kun ---Term Project--- for WSU 2010 Summer Course - CS366 s:
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
The Joy of SAX (and DOM, and JDOM…) Bill MacCartney 11 October 2004.
1 XML Data Management 4. Domain Object Model Werner Nutt.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
5 Processing XML Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.
17 Apr 2002 XML Programming - DOM Andy Clark. DOM Design Premise Derived from browser document model Defined in IDL – Lowest common denominator programming.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
The XML Document Object Model (DOM) Aug’10 – Dec ’10.
Extensible MarkUp Language. AGENDA  OVERVIEW OF XML  DATA TYPE DEFINITION LANGUAGE  XML SCHEMA  XML PARSERS 1) DOM PARSER 2) SAX PARSER 3) JAXB PARSER.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
Consuming eXtensible Markup Language (XML) feeds.
DOM Programming The Document Object Model standardises  what an application can see of the XML data  how it can access it An XML structure is a tree.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
Document Object Model DOM. Agenda l Introduction to DOM l Java API for XML Parsing (JAXP) l Installation and setup l Steps for DOM parsing l Example –Representing.
SNU OOPSLA Lab. DOM/SAX Applications The ubiquitous XML(9) © copyright 2001 SNU OOPSLA Lab.
Java and XML. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added.
SDPLNotes 3.2: DOM1 3.2 Document Object Model (DOM) n How to provide uniform access to structured documents in diverse applications (parsers, browsers,
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
13-Mar-16 DOM. 2 Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
USING ANDROID WITH THE DOM. Slide 2 Lecture Summary DOM concepts SAX vs DOM parsers Parsing HTTP results The Android DOM implementation.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
21-Jun-16 Document Object Model DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C.
Java API for XML Processing
XML. Contents  Parsing an XML Document  Validating XML Documents.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
DOM Document Object Model.
XML Parsers By Chongbing Liu.
Jagdish Gangolly State University of New York at Albany
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
DOM 8-Dec-18.
DOM 24-Feb-19.
XML Programming in Java
SAX2 29-Jul-19.
Presentation transcript:

Processing of structured documents Part 4

XML processing model zXML processor is used to read XML documents and provide access to their content and structure zXML processor works for some application zthe XML specification defines which information the processor should provide to the application

Parsing zinput: an XML document zbasic task: is the document well-formed? zvalidating parsers additionally: is the document valid?

Parsing zparsers produce data structures, which other tools and applications can use ztwo kind of APIs: tree-based and event- based

Tree-based API zcompiles an XML document into an internal tree structure zallows an application to navigate the tree zDocument Object Model (DOM) is a tree- based API for XML and HTML documents

Event-based API zreports parsing events (such as start and end of elements) directly to the application zthe application implements handlers to deal with the different events zSimple API for XML (SAX)

Example Hello, world! zEvents: start document start element: doc start element: para characters: Hello, world! end element: para end element: doc end document

Example (cont.) zan application handles these events just as it would handle events from a graphical user interface (mouse clicks, etc) as the events occur zno need to cache the entire document in memory or secondary storage

Tree-based vs. event- based ztree-based APIs are useful for a wide range of applications, but they may need a lot of resources (if the document is large) zsome applications may need to build their own tree structures, and it is very inefficient to build a parse tree only to map it to another tree

Tree-based vs. event- based zan event-based API is simpler, lower-level access to an XML document zas document is processed sequentially, one can parse documents much larger than the available system memory zown data structures can be constructed using own callback event handlers

SAX zA parser is needed ye.g. Apache Xerces: zand SAX classes ywww.saxproject.org yoften the SAX classes come bundled to the parser distribution

Starting a SAX parser import org.xml.sax.XMLReader; import org.apache.xerces.parsers.SAXParser; XMLReader parser = new SAXParser(); parser.parse(uri);

Content handlers zIn order to let the application do something useful with XML data as it is being parsed, we must register handlers with the SAX parser zhandler is a set of callbacks: application code can be run at important events within a document’s parsing

Core handler interfaces in SAX zorg.xml.sax.ContentHandler zorg.xml.sax.ErrorHandler zorg.xml.sax.DTDHandler zorg.xml.sax.EntityResolver

Custom application classes zcustom application classes that perform specific actions within the parsing process can implement each of the core interfaces zimplementation classes can be registered with the parser with the methods setContentHandler(), etc.

Example: content handlers class MyContentHandler implements ContentHandler { public void startDocument() { System.out.println(”Parsing begins…”); } public void endDocument() { System.out.println(”...Parsing ends.”); }

Element handlers public void startElement (String namespaceURI, String localName, String rawName, Attributes atts) { System.out.print(”startElement: ” + localName); if (!namespaceURI.equals(””)) { System.out.println(” in namespace ” + namespaceURI + ” (” + rawname + ”)”); } else { System.out.println(” has no associated namespace”); } for (int i=0; i<atts.getLength(); i++) { System.out.println(” Attribute: ” + atts.getLocalName(i) + ”=” + atts.getValue(i)); }}

endElement public void endElement(String namespaceURI, String localName, String rawName) { System.out.println(”endElement: ” + localName + ”\n”); }

Character data public void characters (char[] ch, int start, int end){ String s = new String(ch, start, end); System.out.println(”characters: ” + s); } zparser may return all contiguous character data at once, or split the data up into multiple method invocations

Processing instructions zXML documents may contain processing instructions (PIs) za processing instruction tells an application to perform some specific task zform:

Handlers for PIs public void processingInstruction (String target, String data){ System.out.println(”PI: Target:” + target + ” and Data:” + data); } zApplication could receive instructions and set variables or execute methods to perform application-specific processing

Validation zsome parsers are validating, some non- validating zsome parsers can do both zSAX method to turn validation on: parser.setFeature (” true);

Ignorable whitespace zvalidating parser can decide which whitespace can be ignored zfor a non-validating parser, all whitespace is just characters zcontent handler: public void ignorableWhitespace (char[] ch, int start, int end) { … }

Traversing XML: DOM zIn transforming documents, random access to a document is needed zSAX cannot look backward or forward zdifficult to locate siblings and children zDOM: access to any part of the tree ywww.w3.org/DOM/

DOM zLevel 1: navigation of content within a document zLevel 2: modules and options for specific content models, such as XML, HTML, and CSS; events zLevel 3: document loading and saving; access of schemas

Some requirements zAll document content, including elements and attributes, will be programmatically accessible and manipulable zNavigation from any element to any other element will be possible zThere will be a way to add, remove, and change elements/attributes in the document structure

DOM zXML documents are treated as a tree of nodes zevery item is a node zchild elements and enclosed text are subnodes

XML DOM objects zElement zAttr zText zCDATAsection zEntityReference zEntity zDocument z...

Node-related objects zNode ya single node in the document tree zNodeList ya list of node objects (e.g. children) zNamedNodeMap yallows access by name to the collection of attributes

DOM Java bindings zDOM is language-neutral zJava bindings yInterfaces and classes that define and implement the DOM ybindings often included in the parser implementations (the parser generates a DOM tree)

Parsing using a DOM parser Import org.w3c.dom.*; import org.apache.xerces.parsers.DOMParser; DOMParser parser = new DOMParser(); parser.parse(uri);

Output tree zthe entire document is parsed and added into the output tree, before any processing takes place zhandle: org.w3c.dom.Document object = one level above the root element in the document parser.parse(uri); Document doc = parser.getDocument();

Printing a document Private static void printTree(Node node) { switch (node.getNodeType()) { case Node.DOCUMENT_NODE: // Print the contents of the Document object break; case Node.ELEMENT_NODE: // Print the element and its attributes break; case Node.TEXT_NODE:...

…the Document node Case Node.DOCUMENT_NODE: System.out.println(” \n”); Document doc = (Document)node; printTree(doc.getDocumentElement()); break;

… elements Case Node.ELEMENT_NODE: String name= node.getNodeName(); System.out.print(”<” + name); // Print out attributes… (see next slide…) System.out.println(”>”); // recurse on each child NodeList children = node.getChildNodes(); if (children != null) { for (int i=0; i<children.getLength(); i++) { printTree(children.item(i)); } System.out.println(” ”);

… and their attributes case Node.ELEMENT_NODE: String name = node.getNodeName(); System.out.print(”<” + name); NamedNodeMap attributes = node.getAttributes(); for (int i=0; i<attributes.getLength(); i++) { Node current = attributes.item(i); System.out.print(” ” + current.getNodeName() + ”=\”” + current.getNodeValue() + ”\””); } System.out.println(”>”);...

…textual nodes case Node.TEXT_NODE: case Node.CDATA_SECTION_NODE: System.out.print(node.getNodeValue()); break;

Document interface methods zAttr createAttribute(String name) zElement createElement(String tagName) zText createTextNode(String data) zElement getDocumentElement() zElement getElementById(String elementID) zNodeList getElementsByTagName(String tagName)

NodeList interface methods zint getLength() ygets the number of nodes in this list zNode item(int index) ygets the item at the specified index value in the collection

Node interface methods zNamedNodeMap getAttributes() zNodeList getChildNodes() zString getLocalName() zString getNodeName() zString getNodeValue() zNode getParentNode() zshort getNodeType() zappendChild()

Node types zstatic short ATTRIBUTE_NODE zstatic short ELEMENT_NODE zstatic short TEXT_NODE zstatic short DOCUMENT_NODE zstatic short COMMENT_NODE z...

Element interface methods zString getAttribute() yreturns an attribute’s value zString getTagName() yreturn an element’s name zremoveAttribute() y removes an element’s attribute zsetAttribute() yset an attribute’s value

Attr interface methods zString getName() ygets the name of this attribute zElement getOwnerElement() ygets the Element node to which this attribute is attached zString getValue() ygets the value of the attribute as a string

NamedNodeMap interface methods zInt getLength() yreturns the number of nodes in this map zNode getNamedItem(String name) ygets a node indicated by name zNode item(int index) ygets an item in the map by index