1 Processing XML with Java Representation and Management of Data on the Internet.

1 Processing XML with Java Representation and Management of Data on the Internet

2 XML XML is eXtensible Markup Language It is a metalanguage: –A language used to describe other languages using “markup” tags that describe properties of the data Designed to be structured –Strict rules about how data can be formatted Designed to be extensible –Can define own terms and markup

3 XML Family XML is an official recommendation of the W3C Aims to accomplish what HTML cannot and be simpler to use and implement than SGML HTML XML SGML XHTML

4 The Essence of XML Syntax: The permitted arrangement or structure of letters and words in a language as defined by a grammar (XML) Semantics:The meaning of letters or words in a language XML uses Syntax to add Semantics to the documents

5 Using XML In XML there is a separation of the content from the display XML can be used for: –Data representation –Data exchange

6 Databases and XML Database content can be presented in XML –XML processor can access DBMS or file system and convert data to XML –Web server can serve content as either XML or HTML

7 HTML vs. XML HTMLXML improper nesting proper nesting allow start tags, without end tags like empty tags must have a trailing slash, as in unquoted attribute values quoted attribute values HTML is case insensitive XML is case sensitive Whitespace is ignoredWhitespace is important Begins with

8 HTML vs. XML HTMLXML Well defined set of tags Can use any tag you like tags have a known meaning tags have no known meaning

9 Some Things in Common Comments are allowed - Special characters must be escaped (e.g., > for >)

10 Processing XML – The Idea

11 Sample Document 89-344 WEBM GE

12 DOM Parser DOM = Document Object Model Parser creates a tree object out of the document User accesses data by traversing the tree The API allows for constructing, accessing and manipulating the structure and content of XML documents

13 Document as Tree transaction account 89-344 buy ticker shares 100 WEBM exch sell ticker shares 30 NYSE GE exch NASDAQ Methods like: getRoot getChildren getAttributes etc.

14 Advantages and Disadvantages Advantages: –Natural and relatively easy to use –Can repeatedly traverse tree Disadvantages: –High memory requirements – the whole document is kept in memory –Must parse the whole document before use

15 SAX Parser SAX = Simple API for XML Parser creates “events” while traversing tree Parser calls methods (that you write) to deal with the events Similar to an IOStream, goes in one direction

16 Document as Events 89-344 WEBM GE Start tag: transaction Start tag: account Text: 89-344 End tag: account Start tag: buy Attribute: shares Value: 100

17 Advantages and Disadvantages Advantages: –Requires little memory –Fast Disadvantages: –Cannot reread –Less natural for object oriented programmers (perhaps)

18 Which should we use? DOM vs. SAX If your document is very large and you only need a few elements - use SAX If you need to manipulate (i.e., change) the XML - use DOM If you need to access the XML many times - use DOM

19 XML Parsers

20 XML Parsers There are several different ways to categorise parsers: –Validating versus non-validating parsers –DOM parsers versus SAX parsers –Parsers written in a particular language (Java, C++, Perl, etc.)

21 Validating Parsers A validating parser makes sure that the document conforms to the specified DTD This is time consuming, so a non-validating parser is faster

22 Using an XML Parser Three basic steps –Create a parser object –Pass the XML document to the parser –Process the results Generally, writing out XML is not in the scope of parsers (though some may implement proprietary mechanisms)

23 SAX – Simple API for XML

24 The SAX Parser SAX parser is an event-driven API –An XML document is sent to the SAX parser –The XML file is read sequentially –The parser notifies the class when events happen, including errors –The events are handled by the implemented API methods to handle events that the programmer implemented

25 Used to create a SAX Parser Handles document events: start tag, end tag, etc. Handles Parser Errors Handles DTDs and Entities

26 Problem The SAX interface is an accepted standard There are many implementations Like to be able to change the implementation used without changing any code in the program How is this done?

27 Factory Design Pattern Have a “Factory” class that creates the actual Parsers. The Factory checks the value of a system property that states which implementation should be used In order to change the implementation, simply change the system property

28 Creating a SAX Parser Import the following packages: –org.xml.sax.*; –org.xml.sax.helpers.*; Set the following system property: –System.setProperty("org.xml.sax.driver", "org.apache.xerces.parsers.SAXParser"); Create the instance from the Factory: –XMLReader reader = XMLReaderFactory.createXMLReader();

29 Receiving Parsing Information A SAX Parser calls methods such as “startDocument”, “startElement”, etc., as it runs In order to react to such events we must: –implement the ContentHandler interface –set the parser’s content handler with an instance of our class

30 ContentHandler // Methods (partial list) public void startDocument(); public void endDocument(); public void characters(char[] ch, int start, int length); public void startElement(String namespaceURI, String localName, String qName, Attributes atts); public void endElement(String namespaceURI, String localName, String qName);

31 Namespaces and Element Names <forsale date="12/2/03" xmlns:xhtml = "urn:http://www.w3.org/1999/xhtml"> DBI: The Course I Wish I never Took My favorite book!

32 Namespaces and Element Names <forsale date="12/2/03" xmlns:xhtml = "urn:http://www.w3.org/1999/xhtml"> DBI: The Course I Wish I never Took My favorite book! namespaceURI = urn:http://www.w3.org/1999/xhtml localName = em qName = xhtml:em namespaceURI = "" localName = book qName = book

33 Receiving Parsing Information (cont.) An easy way to implement the ContentHandler interface is the extend the DefaultHandler, which implements this interface (and a few others) in an empty fashion To actually parse a document, create an InputSource from the document and supply the input source to the parse method of the XMLReader

34 import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; public class InfoWithSax extends DefaultHandler { public static void main(String[] args) { System.setProperty("org.xml.sax.driver", "org.apache.xerces.parsers.SAXParser"); try { XMLReader reader = XMLReaderFactory.createXMLReader(); reader.setContentHandler(new InfoWithSax()); reader.parse(new InputSource(new FileReader(args[0]))); } catch(Exception e) { e.printStackTrace()} }

35 public static startDocument() throws SAXException { System.out.println(“START DOCUMENT”); } public static endDocument() throws SAXException { System.out.println(“END DOCUMENT”); } int depth; String indent = “ ”; private void println(String header, String value) { for (int i = 0 ; i < depth ; i++) System.out.print(indent); System.out.println(header + ": " + value); }

36 public void characters(char buf[], int offset, int len) throws SAXException { String s = (new String(buf, offset, len)).trim(); if (!"".equals(s)) println("CHARACTERS", s); } public void endElement(String namespaceURI, String localName, String name) throws SAXException { depth--; String elementName = name; if (!"".equals(namespaceURI) && !"".equals(localName)) elementName = namespaceURI + ":" + localName; println("END ELEMENT", elementName); }

37 public static startElement(String namespaceURI, String localName, String name, Attributes attrs) throws SAXException { String elementName = name; if (!"".equals(namespaceURI) && !"".equals(localName)) elementName = namespaceURI + ":" + localName; println("START ELEMENT", elementName); if (attrs != null && attrs.getLength() > 0) { for (int i = 0; i < attrs.getLength(); i++) println("ATTRIBUTE", attrs.getLocalName(i) + “=” + attrs.getValue(i)); } depth++; }

38 Bachelor Tags What do you think happens when the parser parses a bachelor tag?

39 Attributes Interface Elements may have attributes There is no distinction between attributes that are defined explicitly from those that are specified in the DTD (with a default value)

40 Attributes Interface (cont.) int getLength(); String getQName(int i); String getType(int i); String getValue(int i); String getType(String qname); String getValue(String qname); etc.

41 Attributes Types The following are possible types for attributes: –"CDATA", –"ID", –"IDREF", "IDREFS", –"NMTOKEN", "NMTOKENS", –"ENTITY", "ENTITIES", –"NOTATION"

42 Setting Features It is possible to set the features of a parser using the setFeature method. Examples: –reader.setFeature(“http://xml.org/sax/features/nam espaces”, true) –reader.setFeature(“http://xml.org/sax/features/vali dation", false) For a full list, see: http://www.saxproject.org/?selected=get-set http://www.saxproject.org/?selected=get-set

43 ErrorHandler Interface We implement ErrorHandler to receive error events (similar to implementing ContentHandler) DefaultHandler implements ErrorHandler in an empty fashion, so we can extend it (as before) An ErrorHandler is registered with –reader.setErrorHandler(handler); Three methods: –void error(SAXParseException ex); –void fatalError(SAXParserExcpetion ex); –void warning(SAXParserException ex);

44 public void warning(SAXParseException err) throws SAXException { System.out.println(“Warning in line” + err.getLineNumber() + “ and column ” + err.getColumnNumber()); } public void error(SAXParseException err) throws SAXException { System.out.println(“Oy va’avoi, an error!”); } public void fatalError(SAXParseException err) throws SAXException { System.out.println(“OY VA’AVOI, a fatal error!”); } Extending the InfoWithSax Program Will these methods be called in the case of a problem?

45 Lexical Events Lexical events have to do with the way that a document was written and not with its content Examples: –A comment is a lexical event ( ) –The use of an entity is a lexical event (>) These can be dealt with by implementing the LexicalHandler interface, and set on a parser by –reader.setProperty("http://xml.org/sax/properties/ lexical-handler", mylexicalhandler);

46 LexicalHandler // Methods (partial list) public void startEntity(String name); public void endEntity(String name); public void comment(char[] ch, int start, int length); public void startCDATA(); public void endCDATA();

47 DOM – Document Object Model

48 Creating a DOM Tree How can we create a DOM Tree independently of the implementation chosen? Creating a DOM Tree using the Apache Xerces package: –Import: org.apache.xerces.parsers.DOMParser –Import: org.w3c.dom.*; –Use the following lines of code: DOMParser dom = new DOMParser(); dom.parse(fileName); Document doc = dom.getDocument();

49 Using a DOM Tree DOM Parser DOM TreeXML File APIAPI Application

50 Nodes in a DOM Tree DocumentFragment Document CharacterData Text Comment CDATASection Attr Element DocumentType Notation Entity EntityReference ProcessingInstruction Node NodeList NamedNodeMap DocumentType Figure as appears in : “The XML Companion” - Neil Bradley

51 DOM Tree Document Document TypeElement AttributeElement AttributeText ElementTextEntity ReferenceText Comment

52 Normalizing a Tree Normalizing a DOM Tree has two effects: –Combine adjacent textual nodes –Eliminate empty textual nodes To normalize, apply the normalize() method to the document element

53 Node Methods Three categories of methods –Node characteristics: name, type, value –Contextual location and access to relatives: parents, siblings, children, ancestors, descendants –Node modification: Edit, delete, re-arrange child nodes

54 Node Methods (2) short getNodeType(); String getNodeName(); String getNodeValue() throws DOMException; void setNodeValue(String value) throws DOMException; boolean hasChildNodes(); NamedNodeMap getAttributes(); Document getOwnerDocument();

55 Node Types - getNodeType() ELEMENT_NODE = 1 ATTRIBUTE_NODE = 2 TEXT_NODE = 3 CDATA_SECTION_NODE = 4 ENTITY_REFERENCE_NODE = 5 ENTITY_NODE = 6 PROCESSING_INSTRUCTION_NODE = 7 COMMENT_NODE = 8 DOCUMENT_NODE = 9 DOCUMENT_TYPE_NODE = 10 DOCUMENT_FRAGMENT_NODE = 11 NOTATION_NODE = 12 if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node … }

57 Node Navigation Every node has a specific location in tree Node interface specifies methods to find surrounding nodes –Node getFirstChild(); –Node getLastChild(); –Node getNextSibling(); –Node getPreviousSibling(); –Node getParentNode(); –NodeList getChildNodes();

58 Node Navigation (2) getFirstChild() getPreviousSibling() getChildNodes() getNextSibling() getLastChild() getParentNode() Figure as from “The XML Companion” - Neil Bradley

59 import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.*; public class InfoWithDom { public static void main(String[] args) { try { DOMParser dom = new DOMParser(); dom.parse(args[0]); Document doc = dom.getDocument(); new InfoWithDom().echo(doc); } catch(Exception e) { e.printStackTrace()} }

60 private int depth = 0; private final String indent = " "; private String[] NODE_TYPES = {"", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA", "ENTITY_REF", "ENTITY", "PROCESSING_INST", "COMMENT", "DOCUMENT", "DOCUMENT_TYPE", "DOCUMENT_FRAG", "NOTATION"}; private void outputIndentation() { for (int i = 0; i < depth; i++) System.out.print(indent); }

61 private void printlnCommon(Node n) { System.out.print(NODE_TYPES[n.getNodeType()] + ":"); System.out.print(" nodeName=" + n.getNodeName()); String val; if ((val = n.getNamespaceURI()) != null) System.out.print(" uri=" + val); if ((val = n.getPrefix()) != null) System.out.print(" pre=" + val); if ((val = n.getLocalName()) != null) System.out.print(" local=" + val); if ((val = n.getNodeValue()) != null && !val.trim().equals("")) System.out.print(" nodeValue=" + val); System.out.println(); }

62 private void echo(Node n) { outputIndentation(); printlnCommon(n); if (n.getNodeType() == Node.ELEMENT_NODE) { NamedNodeMap atts = n.getAttributes(); indent += 2; for (int i = 0; i < atts.getLength(); i++) echo(atts.item(i)); indent -= 2; } indent++; for (Node child = n.getFirstChild(); child != null; child = child.getNextSibling()) echo(child); indent--; } Example InputExample Output

63 Node Manipulation Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc. Node removeChild(Node old) throws DOMException; Node insertBefore(Node new, Node ref) throws DOMException; Node appendChild(Node new) throws DOMException; Node replaceChild(Node new, Node old) throws DOMException; Node cloneNode(boolean deep);

64 Node Manipulation (2) Ref New insertBefore Old New replaceChild cloneNode Shallow 'false' Deep 'true' Figure as appears in “The XML Companion” - Neil Bradley

65 Other Interfaces We have discussed methods of the Node interface Each of the "specific types of nodes" have additional methods See API for details

66 Note about DOM Objects DOM object  compiled XML Can save time and effort if send and receive DOM objects instead of XML source –Saves having to parse XML files into DOM at sender and receiver –But, DOM object may be larger than XML source

1 Processing XML with Java Representation and Management of Data on the Internet.

Similar presentations

Presentation on theme: "1 Processing XML with Java Representation and Management of Data on the Internet."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Processing XML with Java Representation and Management of Data on the Internet.

Similar presentations

Presentation on theme: "1 Processing XML with Java Representation and Management of Data on the Internet."— Presentation transcript:

Similar presentations

About project

Feedback