CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
The Semantic Web. The Web Today Designed for Human to read Cannot express meaning Architecture: URL –Decentralized: Link structure Language: html.
1 CP3024 Lecture 9 XML revisited, XSL, XSLT, XPath, XSL Formatting Objects.
Parsing XML into programming languages JAXP, DOM, SAX, JDOM/DOM4J, Xerces, Xalan, JAXB.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
JAX- Java APIs for XML by J. Pearce. Some XML Standards Basic –SAX (sequential access parser) –DOM (random access parser) –XSL (XSLT, XPATH) –DTD Schema.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
17 Apr 2002 XML Stylesheets Andy Clark. What Is It? Extensible Stylesheet Language (XSL) Language for document transformation – Transformation (XSLT)
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
The Joy of SAX (and DOM, and JDOM…) Bill MacCartney 11 October 2004.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
XML and its applications: 4. Processing XML using PHP.
Structured-Document Processing Languages Spring 2011 Course Review Repetitio mater studiorum est!
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
3/29/2001 O'Reilly Java Java API for XML Processing 1.1 What’s New Edwin Goei Engineer, Sun Microsystems.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
EXtensible Markup Language (XML) James Atlas July 15, 2008.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Session IV Chapter 9 – XML Schemas
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
CITA 330 Section 6 XSLT. Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace "
XSLT Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Extensible MarkUp Language. AGENDA  OVERVIEW OF XML  DATA TYPE DEFINITION LANGUAGE  XML SCHEMA  XML PARSERS 1) DOM PARSER 2) SAX PARSER 3) JAXB PARSER.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
Java and XML. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
Leonidas FegarasThe Joy of SAX1 The Joy of SAX Leonidas Fegaras University of Texas at Arlington
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
1 JAXP & XPATH. Objectives 2  XPath  JAXP Processing of XPath  Workshops.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
USING ANDROID WITH THE DOM. Slide 2 Lecture Summary DOM concepts SAX vs DOM parsers Parsing HTTP results The Android DOM implementation.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
Java API for XML Processing
XML. Contents  Parsing an XML Document  Validating XML Documents.
{ XML Technologies } BY: DR. M’HAMED MATAOUI
Unit 4 Representing Web Data: XML
Parsing XML into programming languages
Java/XML.
{ XML Technologies } BY: DR. M’HAMED MATAOUI
Chapter 7 Representing Web Data: XML
Java API for XML Processing
Presentation transcript:

CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras

CSE 6331 © Leonidas Fegaras XML Tools2 XML Processing document parser document validator application XML document XML infoset XML infoset (annotated) Well-formedness checks & reference expansion DTD or XML schema storage system

CSE 6331 © Leonidas Fegaras XML Tools3 Tools for XML Processing DOM: a language-neutral interface for manipulating XML data –requires that the entire document be in memory SAX: push-based stream processing –hard to write non-trivial applications XPath: a declarative tree-navigation language –beautiful and easy to use –is part of many other languages XSLT: a language for transforming XML based on templates –very ugly! XQuery: full-fledged query language –influenced by OQL XmlPull: pull-based stream processing –far better than SAX, but not a standard yet

CSE 6331 © Leonidas Fegaras XML Tools4 DOM The Document Object Model (DOM) is a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of XML documents. The following is part of the DOM interface: public interface Node { public String getNodeName (); public String getNodeValue (); public NodeList getChildNodes (); public NamedNodeMap getAttributes (); } public interface Element extends Node { public Node getElementsByTagName ( String name ); } public interface Document extends Node { public Element getDocumentElement (); } public interface NodeList { public int getLength (); public Node item ( int index ); }

CSE 6331 © Leonidas Fegaras XML Tools5 import java.io.File; import javax.xml.parsers.*; import org.w3c.dom.*; class Test { public static void main ( String args[] ) throws Exception { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new File("depts.xml")); NodeList nodes = doc.getDocumentElement().getChildNodes(); for (int i=0; i<nodes.getLength(); i++) { Node n = nodes.item(i); NodeList ndl = n.getChildNodes(); for (int k=0; k<ndl.getLength(); k++) { Node m = ndl.item(k); if ( (m.getNodeName() == "dept") && (m.getFirstChild().getNodeValue() == "cse") ) { NodeList ncl = ((Element) m).getElementsByTagName("tel"); for (int j=0; j<ncl.getLength(); j++) { Node nc = ncl.item(j); System.out.print(nc.getFirstChild().getNodeValue()); } } } DOM Example /*[dept/text()=“cse”]/tel/text()

CSE 6331 © Leonidas Fegaras XML Tools6 Better Programming import java.io.File; import javax.xml.parsers.*; import org.w3c.dom.*; import java.util.Vector; class Sequence extends Vector { Sequence () { super(); } Sequence ( String filename ) throws Exception { super(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new File(filename)); add((Object) doc.getDocumentElement()); } Sequence child ( String tagname ) { Sequence result = new Sequence(); for (int i = 0; i<size(); i++) { Node n = (Node) elementAt(i); NodeList c = n.getChildNodes(); for (int k = 0; k<c.getLength(); k++) if (c.item(k).getNodeName().equals(tagname)) result.add((Object) c.item(k)); }; return result; } void print () { for (int i = 0; i<size(); i++) System.out.println(elementAt(i).toString()); } class DOM { public static void main ( String args[] ) throws Exception { (new Sequence("cs.xml")).child("gradstudent").child("name").print(); }

CSE 6331 © Leonidas Fegaras XML Tools7 SAX SAX is a Simple API for XML that allows you to process a document as it's being read –in contrast to DOM, which requires the entire document to be read before it takes any action) The SAX API is event based –The XML parser sends events, such as the start or the end of an element, to an event handler, which processes the information

CSE 6331 © Leonidas Fegaras XML Tools8 Parser Events Receive notification of the beginning of a document void startDocument () Receive notification of the end of a document void endDocument () Receive notification of the beginning of an element void startElement ( String namespace, String localName, String qName, Attributes atts ) Receive notification of the end of an element void endElement ( String namespace, String localName, String qName ) Receive notification of character data void characters ( char[] ch, int start, int length )

CSE 6331 © Leonidas Fegaras XML Tools9 SAX Example: a Printer import java.io.FileReader; import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; class Printer extends DefaultHandler { public Printer () { super(); } public void startDocument () {} public void endDocument () { System.out.println(); } public void startElement ( String uri, String name, String tag, Attributes atts ) { System.out.print(“ ”); } public void endElement ( String uri, String name, String tag ) { System.out.print(“ ”); } public void characters ( char text[], int start, int length ) { System.out.print(new String(text,start,length)); }

CSE 6331 © Leonidas Fegaras XML Tools10 The Child Handler class Child extends DefaultHandler { DefaultHandler next;// the next handler in the pipeline String ptag;// the tagname of the child boolean keep;// are we keeping or skipping events? short level;// the depth level of the current element public Child ( String s, DefaultHandler n ) { super(); next = n; ptag = s; keep = false; level = 0; } public void startDocument () throws SAXException { next.startDocument(); } public void endDocument () throws SAXException { next.endDocument(); }

CSE 6331 © Leonidas Fegaras XML Tools11 The Child Handler (cont.) public void startElement ( String nm, String ln, String qn, Attributes a ) throws SAXException { if (level++ == 1) keep = ptag.equals(qn); if (keep) next.startElement(nm,ln,qn,a); } public void endElement ( String nm, String ln, String qn ) throws SAXException { if (keep) next.endElement(nm,ln,qn); if (--level == 1) keep = false; } public void characters ( char[] text, int start, int length ) throws SAXException { if (keep) next.characters(text,start,length); }

CSE 6331 © Leonidas Fegaras XML Tools12 Forming the Pipeline class SAX { public static void main ( String args[] ) throws Exception { SAXParserFactory pf = SAXParserFactory.newInstance(); SAXParser parser = pf.newSAXParser(); DefaultHandler handler = new Child("gradstudent", new Child("name", new Printer())); parser.parse(new InputSource(new FileReader("cs.xml")), handler); } Child:gradstudent Child:name PrinterSAX parser

CSE 6331 © Leonidas Fegaras XML Tools13 Example Input Stream Computer Science Smith John... SAX Events SD: SE: department SE: deptname C: Computer Science EE: deptname SE: gradstudent SE: name SE: lastname C: Smith EE: lastname SE: firstname C: John EE: firstname EE: name EE: gradstudent... EE: department ED: Child: gradstudentChild: namePrinter

CSE 6331 © Leonidas Fegaras XML Tools14 XmlPull Unlike SAX, you pull events from document Create a pull parser: XmlPullParser xpp; xpp = factory.newPullParser(); Pull the next event: xpp.getEventType() Type of events: –START_TAG –END_TAG –TEXT –START_DOCUMENT –END_DOCUMENT More information at:

CSE 6331 © Leonidas Fegaras XML Tools15 Better XmlPull Events class Attributes { public String[] names; public String[] values; } abstract class Event { } class StartTag extends Event { public String tag; public Attributes attributes; } class EndTag extends Event { public String tag; } class CData extends Event { public String text; } class EOS extends Event {}

CSE 6331 © Leonidas Fegaras XML Tools16 Iterators import org.xmlpull.v1.XmlPullParser; import org.xmlpull.v1.XmlPullParserFactory; abstract class Iterator { abstract public void open (); // open the stream iterator abstract public void close (); // close the stream iterator abstract public Event next (); // get the next tuple from stream } abstract class Filter extends Iterator { Iterator input; }

CSE 6331 © Leonidas Fegaras XML Tools17 Document Reader class Document extends Iterator { String path; int state; FileReader reader; XmlPullParser xpp; static XmlPullParserFactory factory; Event getEvent () { int eventType = xpp.getEventType(); if (eventType == XmlPullParser.START_TAG) { int len = xpp.getAttributeCount(); String[] names = new String[len]; String[] values = new String[len]; for (int i = 0; i<len; i++) { names[i] = xpp.getAttributeName(i); values[i] = xpp.getAttributeValue(i); }; return new StartTag(xpp.getName(),new Attributes(names,values)); } else if (eventType == XmlPullParser.END_TAG) return new EndTag(xpp.getName()); else if (eventType == XmlPullParser.TEXT) { int[] v = new int[2]; char[] ch = xpp.getTextCharacters(v); return new CData(new String(ch,v[0],v[1])); }}

CSE 6331 © Leonidas Fegaras XML Tools18 Document Reader (cont.) public void open () { reader = new FileReader(path); xpp = factory.newPullParser(); xpp.setInput(reader); state = 0; } public void close () { reader.close(); } public Event next () { if (state > 0) { state++; if (state == 2) return new EOS(); }; Event e = getEvent(); if (xpp.getEventType() != XmlPullParser.END_DOCUMENT) xpp.next(); return e; }

CSE 6331 © Leonidas Fegaras XML Tools19 The Child Iterator class Child extends Filter { String tag; short nest; // the nesting level of the event boolean keep; // are we in keeping mode? public void open () { keep = false; nest = 0; input.open(); } public Event next () { while (true) { Event t = input.next(); if (t instanceof EOS) return t; else if (t instanceof StartTag) { if (nest++ == 1) { keep = tag.equals(((StartTag) t).tag); if (!keep) continue; } } else if (t instanceof EndTag) if (--nest == 1 && keep) { keep = false; return t; }; if (keep) return t; } } }

CSE 6331 © Leonidas Fegaras XML Tools20 XSL Transformation A stylesheet specification language for converting XML documents into various forms (XML, HTML, plain text, etc). Can transform each XML element into another element, add new elements into the output file, or remove elements. Can rearrange and sort elements, test and make decisions about which elements to display, and much more. Based on XPath: <xsl:stylesheet version=’1.0’ xmlns:xsl=’http//

CSE 6331 © Leonidas Fegaras XML Tools21 XSLT Templates XSL uses XPath to define parts of the source document that match one or more predefined templates. When a match is found, XSLT will transform the matching part of the source document into the result document. The parts of the source document that do not match a template will end up unmodified in the result document (they will use the default templates). Form: … The default (implicit) templates visit all nodes and strip out all tags:

CSE 6331 © Leonidas Fegaras XML Tools22 Other XSLT Elements select the value of an XML element and add it to the output stream of the transformation, e.g.. copy the entire XML element to the output stream of the transformation. apply the template rules to the elements that match the XPath expression. … add an element to the output with a tag-name derived from the XPath. Example: <xsl:stylesheet version = ’1.0’ xmlns:xsl=’

CSE 6331 © Leonidas Fegaras XML Tools23 Copy the Entire Document <xsl:stylesheet version = ’1.0’ xmlns:xsl=’

CSE 6331 © Leonidas Fegaras XML Tools24 More on XSLT Conflict resolution: more specific templates overwrite more general templates. Templates are assigned default priorities, but they can be overwritten using priority=“n” in a template. Modes can be used to group together templates. No mode is an empty mode. Conditional and loop statements: body Variables can be used to name data: value Variables are used as {$x} in XPaths.

CSE 6331 © Leonidas Fegaras XML Tools25 Using XSLT import javax.xml.parsers.*; import org.xml.sax.*; import org.w3c.dom.*; import javax.xml.transform.*; import javax.xml.. transform.dom.*; import javax.xml.transformstream.*; import java.io.*; class XSLT { public static void main ( String argv[] ) throws Exception { File stylesheet = new File("x.xsl"); File xmlfile = new File("a.xml"); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document document = db.parse(xmlfile); StreamSource stylesource = new StreamSource(stylesheet); TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(stylesource); DOMSource source = new DOMSource(document); StreamResult result = new StreamResult(System.out); transformer.transform(source,result); }