Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!

Similar presentations


Presentation on theme: "Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!"— Presentation transcript:

1 Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!

2 SDPL 2004Course Review2 Goals of the Course n To familiarize with the most important models and languages for –manipulating –representing –transforming and –querying structured documents (or XML) n Generic XML processing technology –very little about specific XML applications or commercial systems

3 SDPL 2004Course Review3 Methodological Goals n Some central professional skills –consulting of technical specifications –experimenting with SW implementations n Ability to think…? –to find out relationships –to apply knowledge in new situations n ("Pidgin English" for scientific communication)

4 SDPL 2004Course Review4 XML? n Extensible Markup Language is not a markup language! –does not fix a tag set nor its semantics (like markup languages like HTML do) n XML is –A way to use markup to represent information –A metalanguage »supports definition of specific markup languages through XML DTDs or Schemas »E.g. XHTML a reformulation of HTML using XML

5 SDPL 2004Course Review5 XML Encoding of Structure: Example <S> S E <W><W></W> </E>Helloworld! W Hello W world! </W> </S> A=1

6 SDPL 2004Course Review6 Basics of XML DTDs n A Document Type Declaration provides a grammar (document type definition, DTD) for a class of documents Syntax (in the prolog of a document instance): [ ]> Syntax (in the prolog of a document instance): [ ]> n DTD is the union of the external and internal subset

7 SDPL 2004Course Review7 How do Declarations Look Like? <!ATTLIST item price NMTOKEN #REQUIRED unit (FIM | EUR) ”EUR” >

8 SDPL 2004Course Review8 Element type declarations The general form is where E is a content model The general form is where E is a content model  regular expression of element names n Content model operators: E | F : alternationE, F: concatenation E? : optionalE* : zero or more E+ : one or more(E) : grouping

9 SDPL 2004Course Review9 XML Schema Definition Language n XML syntax –schema documents easier to manipulate by programs (than the special DTD syntax) n Compatibility with namespaces –can validate documents using declarations from multiple sources n Content datatypes –44 built-in datatypes (including primitive Java datatypes, datatypes of SQL, and XML attribute types) –mechanisms to derive user-defined datatypes

10 SDPL 2004Course Review10 XML Namespaces </xsl:template></xsl:stylesheet>

11 SDPL 2004Course Review11 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser interfaces 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing

12 SDPL 2004Course Review12 A SAX-based application Application Main Routine startDocument() startElement() characters() Parse() Callback Routines endElement() </A>Hi! "A",[i="1"] "Hi!" "A"

13 SDPL 2004Course Review13 DOM: What is it? n An object-based, language-neutral API for XML and HTML documents –Allows programs and scripts to build, navigate, and modify documents –a foundation for developing querying, filtering, transformation, rendering etc. applications on top of DOM implementations n In contrast to “Serial Access XML” could think as “Directly Obtainable in Memory”

14 SDPL 2004Course Review14 <invoice form="00" type="estimated"> type="estimated"> John Doe John Doe Pyynpolku 1 Pyynpolku 1 70460 KUOPIO 70460 KUOPIO...... DOM structure model invoice name addressdata address form="00"type="estimated" John Doe streetaddresspostoffice 70460 KUOPIO Pyynpolku 1... Document Element NamedNodeMap Text

15 SDPL 2004Course Review15 Overview of XSLT Transformation

16 SDPL 2004Course Review16 JAXP 1.1 n An interface for “plugging-in” and using XML processors in Java applications –includes packages »org.xml.sax: SAX 2.0 interface »org.w3c.dom: DOM Level 2 interface »javax.xml.parsers: initialization and use of parsers »javax.xml.transform: initialization and use of transformers (XSLT processors) n Included in JDK starting from vers. 1.4

17 SDPL 2004Course Review17 XML.getXMLReader() JAXP: Using a SAX parser (1) f.xml.parse( ”f.xml”) ”f.xml”).newSAXParser()

18 SDPL 2004Course Review18 f.xml JAXP: Using a DOM parser (1).parse(”f.xml”).newDocument().newDocumentBuilder()

19 SDPL 2004Course Review19 XSLT JAXP: Using Transformers (1).newTransformer(…).transform(.,.)

20 SDPL 2004Course Review20 Transformation & Formatting XSLT script I II

21 SDPL 2004Course Review21 Page regions A simple page can contain 1-5 regions, specified by child elements of the simple-page-master A simple page can contain 1-5 regions, specified by child elements of the simple-page-master

22 SDPL 2004Course Review22 Top-level formatting objects n Slightly simplified: fo:root fo:layout-master-set (fo:simple-page-master | fo:page-sequence-master)+ fo:page-sequence + fo:region- body fo:region- before? fo:region-end? fo:region-start? fo:region- after? specify masters for page sequences by referring to simple-page-masters contents of pages

23 SDPL 2004Course Review23 XML-wrapping n Need ”XML-wrappers” (aka extractors) –interface/conversion program to produce an XML representation for source data source1 source2 source3 wrapper1 XML-form-1 wrapper2 wrapper3 XML-form-2

24 SDPL 2004Course Review24 XW-architecture (3) application XW-engineSAX AA x1 x2 BB y1 y2 z1 z2 … … x1 x2 x1 x2 </part-a><part-b> y1 y2 y1 y2 z2 z2 </part-b> result document source data wrapper description

25 SDPL 2004Course Review25 <whole> </whole> <whole> one one two two three three </whole> Axy.. z$??? B.. 1one 2two 3three Rearranging result structures xy..  z xy..  z

26 SDPL 2004Course Review26 XQuery in a Nutshell n Functional expression language n Strongly-typed: (XML Schema) types may be assigned to expressions statically n Includes XPath 2.0 (says Draft, but not all XPath axes included!) –XQuery 1.0 and XPath 2.0 share extensive functionality: »XQuery 1.0 and XPath 2.0 Functions and Operators, WD 15/11/2002 n Roughly: XQuery  XPath' + XSLT' + SQL'

27 SDPL 2004Course Review27 Course Main Message n XML is a universal way to represent info as tree-like data structures n There are specialized and powerful technologies for processing it n The development is going on


Download ppt "Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!"

Similar presentations


Ads by Google