21-Jun-16 Document Object Model DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

J0 1 Marco Ronchetti - Basi di Dati Web e Distribuite – Laurea Specialistica in Informatica – Università di Trento.
XML IV. The Document Object Model The Document Object model is a hierarchical structure of an XML document. It provides a means for accessing, and manipulating.
The Document Object Model
SE 5145 – eXtensible Markup Language (XML ) DOM (Document Object Model) (Part II – Java API) /Spring, Bahçeşehir University, Istanbul.
Document Object Model. Lecture 18 The Document Object Model (DOM) is not a programming language It is an object-oriented model of web documents Each.
Document Object Model (DOM): An Abstract Data Structure for XML data Alex Dekhtyar Department of Computer Science University of Kentucky.
Lecture 4 Java Interfaces (review of inheritance and abstract classes) The XML DOM Java Examples Homework 3.
11-Jun-15 More DOM. Manipulating DOM trees DOM, unlike SAX, gives you the ability to create and modify XML trees There are a few roadblocks along the.
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
1 XML and Data Management XML Processors Hachim Haddouti Al Akhawayn University SSE
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
Java API for XML Processing (JAXP) CSE 4/586: Distributed Systems Department of Computer Science and Engineering University at Buffalo, New York Jia Zhao.
Document Object Model (DOM): An Abstract Data Structure for XML data Alex Dekhtyar Department of Computer Science CSC 560: Management of XML Data.
1 Processing XML with Java Representation and Management of Data on the Internet A comprehensive tutorial about XML processing with JavaXML processing.
1 Processing XML with Java CS , Spring 2008/9.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
JAX- Java APIs for XML by J. Pearce. Some XML Standards Basic –SAX (sequential access parser) –DOM (random access parser) –XSL (XSLT, XPATH) –DTD Schema.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
SE 5145 – eXtensible Markup Language (XML ) DOM (Document Object Model) (Part I) /Spring, Bahçeşehir University, Istanbul.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
1 XML Data Management 4. Domain Object Model Werner Nutt.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
17 Apr 2002 XML Programming - DOM Andy Clark. DOM Design Premise Derived from browser document model Defined in IDL – Lowest common denominator programming.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
Parsing with DOM using MSXML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
The XML Document Object Model (DOM) Aug’10 – Dec ’10.
XML 6.4 DOM 6. The XML ‘Alphabet Soup’ XMLExtensible Markup Language Defines XML documents XSLExtensible Stylesheet Language Language for expressing stylesheets;
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Consuming eXtensible Markup Language (XML) feeds.
Consuming eXtensible Markup Language (XML) feeds.
DOM Programming The Document Object Model standardises  what an application can see of the XML data  how it can access it An XML structure is a tree.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
SNU OOPSLA Lab. DOM/SAX Applications The ubiquitous XML(9) © copyright 2001 SNU OOPSLA Lab.
SDPLNotes 3.2: DOM1 3.2 Document Object Model (DOM) n How to provide uniform access to structured documents in diverse applications (parsers, browsers,
Introduction to the Document Object Model Eugenia Fernandez IUPUI.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Parser. 2 Microsoft XML data by itself cannot do anything; you need to process that data to do something meaningful. The software that processes XML.
CS 157B: Database Management Systems II February 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
XML and Object Serialization. Structure of an XML Document Header Root Element Start Tags / End Tags Element Contents – Child Elements – Text – Both (mixed.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Copyright © 2012 Pearson Education, Inc. Chapter 20: Binary Trees.
1. What is it? It is a queue that access elements according to their importance value. Eg. A person with broken back should be treated before a person.
Document Object Model.  The XML DOM (Document Object Model) defines a standard way for accessing and manipulating XML documents.  The DOM presents an.
Processing of structured documents Part 4. XML processing model zXML processor is used to read XML documents and provide access to their content and structure.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
13-Mar-16 DOM. 2 Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20: Binary Trees.
USING ANDROID WITH THE DOM. Slide 2 Lecture Summary DOM concepts SAX vs DOM parsers Parsing HTTP results The Android DOM implementation.
XML DOM Week 11 Web site:
XML & JSON. Background XML and JSON are to standard, textual data formats for representing arbitrary data – XML stands for “eXtensible Markup Language”
XML. Contents  Parsing an XML Document  Validating XML Documents.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java/XML.
Chapter 20: Binary Trees.
More DOM 13-Nov-18.
Chapter 21: Binary Trees.
DOM Document Object Model.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
More DOM 28-Nov-18.
DOM 8-Dec-18.
More DOM.
DOM 24-Feb-19.
CS 240 – Advanced Programming Concepts
XML and Web Services (II/2546)
Presentation transcript:

21-Jun-16 Document Object Model DOM

SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc (but very popular) standard There are various implementations available Java implementations are provided in JAXP (Java API for XML Processing) Unlike many XML technologies, SAX and DOM are relatively easy

Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML document and sends an event for each element that it encounters Consequences: DOM provides “random access” into the XML document SAX provides only sequential access to the XML document DOM is slow and requires huge amounts of memory, so it cannot be used for large XML documents SAX is fast and requires very little memory, so it can be used for huge documents (or large numbers of documents) This makes SAX much more popular for web sites Some DOM implementations have methods for changing the XML document in memory; SAX implementations do not

Reading in the tree The parse method reads in the entire XML document and represents it as a tree in memory For a large document, parsing could take a while If you want to interact with your program while it is parsing, you need to parse in a separate thread Once parsing starts, you cannot interrupt or stop it Do not try to access the parse tree until parsing is done An XML parse tree may require up to ten times as much memory as the original XML document If you have a lot of tree manipulation to do, DOM is much more convenient than SAX If you don’t have a lot of tree manipulation to do, consider using SAX instead

Structure of the DOM tree The DOM tree is composed of Node objects Node is an interface Some of the more important subinterfaces are Element, Attr, and Text An Element node may have children Attr and Text nodes are leaves Additional types are Document, ProcessingInstruction, Comment, Entity, CDATASection and several others Hence, the DOM tree is composed entirely of Node objects, but the Node objects can be downcast into more specific types as needed

Operations on Node s, I The results returned by getNodeName(), getNodeValue(), getNodeType() and getAttributes() depend on the subtype of the node, as follows: Element Text Attr getNodeName() getNodeValue() getNodeType() getAttributes() tag name null ELEMENT_NODE NamedNodeMap "#text" text contents TEXT_NODE null name of attribute value of attribute ATTRIBUTE_NODE null

Distinguishing Node types Here’s an easy way to tell what kind of a node you are dealing with: switch(node.getNodeType()) { case Node.ELEMENT_NODE: Element element = (Element)node;...; break; case Node.TEXT_NODE: Text text = (Text)node;... break; case Node.ATTRIBUTE_NODE: Attr attr = (Attr)node;... break; default:... }

Operations on Node s, II Tree-walking operations that return a Node : getParentNode() getFirstChild() getNextSibling() getPreviousSibling() getLastChild() Tests that return a boolean : hasAttributes() hasChildNodes()

Operations for Element s String getTagName() Returns the name of the tag boolean hasAttribute(String name) Returns true if this Element has the named attribute String getAttribute(String name) Returns the (String) value of the named attribute boolean hasAttributes() Returns true if this Element has any attributes This method is actually inherited from Node Returns false if it is applied to a Node that isn’t an Element NamedNodeMap getAttributes() Returns a NamedNodeMap of all the Element’s attributes This method is actually inherited from Node Returns null if it is applied to a Node that isn’t an Element

NamedNodeMap The node.getAttributes() operation returns a NamedNodeMap Because NamedNodeMap s are used for other kinds of nodes (elsewhere in Java), the contents are treated as general Node s, not specifically as Attr s Some operations on a NamedNodeMap are: getNamedItem(String name) returns (as a Node ) the attribute with the given name getLength() returns (as an int ) the number of Node s in this NamedNodeMap item(int index) returns (as a Node ) the index th item This operation lets you conveniently step through all the nodes in the NamedNodeMap Java does not guarantee the order in which nodes are returned

Operations on Text s Text is a subinterface of CharacterData and inherits the following operations (among others): public String getData() throws DOMException Returns the text contents of this Text node public int getLength() Returns the number of Unicode characters in the text public String substringData(int offset, int count) throws DOMException Returns a substring of the text contents

Operations on Attr s String getName() Returns the name of this attribute. Element getOwnerElement() Returns the Element node this attribute is attached to, or null if this attribute is not in use boolean getSpecified() Returns true if this attribute was explicitly given a value in the original document String getValue() Returns the value of the attribute as a String

Preorder traversal The DOM is stored in memory as a tree An easy way to traverse a tree is in preorder You should remember how to do this from your course in Data Structures The general form of a preorder traversal is: Visit the root Traverse each subtree, in order

Preorder traversal in Java static void simplePreorderPrint(String indent, Node node) { printNode(indent, node); if(node.hasChildNodes()) { Node child = node.getFirstChild(); while (child != null) { simplePreorderPrint(indent + " ", child); child = child.getNextSibling(); } } } static void printNode(String indent, Node node) { System.out.print(indent); System.out.print(node.getNodeType() + " "); System.out.print(node.getNodeName() + " "); System.out.print(node.getNodeValue() + " "); System.out.println(node.getAttributes()); }

Trying out the program Input: The Beginning The Middle The End Output: 1 novel null 3 #text null 1 chapter null num="1“ 3 #text The Beginning null 3 #text null 1 chapter null num="2“ 3 #text The Middle null 3 #text null 1 chapter null num="3“ 3 #text The End null 3 #text null Things to think about: What are the numbers? Are the null s in the right places? Is the indentation as expected? How could this program be improved?

Additional DOM operations I’ve left out all the operations that allow you to modify the DOM tree, for example: setNodeValue(String nodeValue ) insertBefore(Node newChild, Node refChild ) Java provides a large number of these operations These operations are not part of the W3C specifications There is no standardized way to write out a DOM as an XML document It isn’t that hard to write out the XML The previous program is a good start on outputting XML

The End