Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured-Document Processing Languages Spring 2007 Course Review Repetitio mater studiorum est!

Similar presentations


Presentation on theme: "Structured-Document Processing Languages Spring 2007 Course Review Repetitio mater studiorum est!"— Presentation transcript:

1 Structured-Document Processing Languages Spring 2007 Course Review Repetitio mater studiorum est!

2 SDPL 2007Course Review2 Goals of the Course n Learn about central models and languages for –manipulating –representing –transforming and –querying structured documents (or XML) n "Generic XML processing technology"

3 SDPL 2007Course Review3 Methodological Goals n Central professional skills –consulting technical specifications –experimenting with SW implementations n Ability to think…? –to find out relationships –to apply knowledge in new situations n ("Pidgin English" for scientific communication)

4 SDPL 2007Course Review4 XML? n Extensible Markup Language is not a markup language! –does not fix a tag set nor its semantics (like markup languages like HTML do) n XML is –A way to use markup to represent information –A metalanguage »supports definition of specific markup languages through XML DTDs or Schemas »E.g. XHTML a reformulation of HTML using XML

5 SDPL 2007Course Review5 XML Encoding of Structure: Example <S> S E <W><W></W> </E>Helloworld! W Hello W world! </W> </S> A=1

6 SDPL 2007Course Review6 Basics of XML DTDs n A Document Type Declaration provides a grammar (document type definition, DTD) for a class of documents Syntax (in the prolog of document instance): [ ]> Syntax (in the prolog of document instance): [ ]> n DTD = union of the external and internal subset

7 SDPL 2007Course Review7 How do Declarations Look Like? <!ATTLIST item price NMTOKEN #REQUIRED unit (FIM | EUR) ”EUR” >

8 SDPL 2007Course Review8 Element type declarations The general form is where E is a content model The general form is where E is a content model  regular expression of element names n Content model operators: E | F : alternationE, F: concatenation E? : optionalE* : zero or more E+ : one or more(E) : grouping

9 SDPL 2007Course Review9 XML Schema Definition Language n XML syntax –schema documents easier to manipulate by programs (than the DTD syntax) n Compatibility with namespaces –can validate documents using declarations from multiple sources n Content datatypes –44 built-in datatypes (including primitive Java datatypes, datatypes of SQL, and XML attribute types) –+ user-defined datatypes

10 SDPL 2007Course Review10 XML Namespaces </xsl:template></xsl:stylesheet>

11 SDPL 2007Course Review11 3. XML Processor APIs n How can applications manipulate structured documents? –Overview of document parser interfaces 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing

12 SDPL 2007Course Review12 A SAX-based application Application Main Routine startDocument() startElement() characters() Parse() Callback Routines endElement() </A>Hi! "A",[i="1"] "Hi!" "A"

13 SDPL 2007Course Review13 DOM: What is it? n Object-based, language-neutral API for XML and HTML documents –Allows programs/scripts to »build »navigate and »modify documents n “Directly Obtainable in Memory” vs “Serial Access XML”

14 SDPL 2007Course Review14 <invoice form="00" type="estimated"> type="estimated"> John Doe John Doe Pyynpolku 1 Pyynpolku 1 70460 KUOPIO 70460 KUOPIO...... DOM structure model invoice name addressdata address form="00"type="estimated" John Doe streetaddresspostoffice 70460 KUOPIO Pyynpolku 1... Document Element NamedNodeMap Text

15 SDPL 2007Course Review15 Overview of XSLT Transformation

16 SDPL 2007Course Review16 JAXP (Java API for XML Processing) n An interface for “plugging-in” and using XML processors in Java applications –includes packages »org.xml.sax: SAX 2.0 interface »org.w3c.dom: DOM Level 2 interface »javax.xml.parsers: initialization and use of parsers »javax.xml.transform: initialization and use of transformers (XSLT processors) n Included in standard Java

17 SDPL 2007Course Review17 XML.getXMLReader() JAXP: Using a SAX parser (1) f.xml.parse( ”f.xml”) ”f.xml”).newSAXParser()

18 SDPL 2007Course Review18 f.xml JAXP: Using a DOM parser (1).parse(”f.xml”).newDocument().newDocumentBuilder()

19 SDPL 2007Course Review19 XSLT JAXP: Using Transformers (1).newTransformer(…).transform(.,.)

20 SDPL 2007Course Review20 4. Introduction to Style Sheets n Specify and produce visual representation for structured documents n by defining a mapping from document structure+content to formatting tasks, and –inserting/generating new text –numbering –rearranging n by rules based on contextual conditions

21 SDPL 2007Course Review21 Process of Transformation (muunnos) Formatter input Transformation TeX FOT (XSL formatting object tree) Style sheet - Latex style file, CSS, XSLT

22 SDPL 2007Course Review22 Process of Formatting (muotoilu) n Creates a detailed description of presentation –> style sheet may not have complete control of the final formatted presentation!

23 SDPL 2007Course Review23 Process of Rendering (hahmonnus) n Display/play the document on output medium

24 SDPL 2007Course Review24 CSS - Cascading Style Sheets n A stylesheet language –mainly to specify the representation of web pages by attaching style (fonts, colours, margins, …) to HTML/XML documents Example style rule: H1 {color: blue; font-weight: bold;} Example style rule: H1 {color: blue; font-weight: bold;}

25 SDPL 2007Course Review25 CSS Processing Model (simplified) 0. Parse the document 1. Match style rules to elements of the doc tree –annotate each element with values assigned for properties »inheritance and elaborate "cascade" rules applied to select which value is assigned 2. Generate a formatting structure –of nested rectangular boxes 3. Render the formatting structure –display, print, audio-synthesize,...

26 SDPL 2007Course Review26 XSL: Transformation & Formatting XSLT script I II

27 SDPL 2007Course Review27 Page regions A simple page can contain 1-5 regions, specified by child elements of the simple-page-master A simple page can contain 1-5 regions, specified by child elements of the simple-page-master

28 SDPL 2007Course Review28 Top-level formatting objects n Slightly simplified: fo:root fo:layout-master-set (fo:simple-page-master | fo:page-sequence-master)+ fo:page-sequence + fo:region- body fo:region- before? fo:region-after? fo:region-end? fo:region- start? specify masters for page sequences, by referring to simple-page-masters contents of pages fo:flow

29 SDPL 2007Course Review29 XQuery in a Nutshell n Functional expression language –A query is a side-effect-free expression n Operates on sequences of items –atomic values or XML nodes Strongly-typed: (XML Schema) types may be assigned to expressions statically, and results can be validated Strongly-typed: (XML Schema) types may be assigned to expressions statically, and results can be validated n Extends XPath 2.0 ( but not all axes required) –common for XQuery 1.0 and XPath 2.0: »Functions and Operators, W3C Rec. 01/2007 n Roughly: XQuery  XPath 2.0 + XSLT' + SQL'

30 SDPL 2007Course Review30 FLWOR ("flower") Expressions n for, let, where, order by and return clauses (~SQL select-from-where) n Form: (ForClause | LetClause)+ WhereClause? OrderByClause? "return" Expr n binds variables to values, and uses these bindings to construct a result (an ordered sequence of nodes)

31 SDPL 2007Course Review31 XQuery Example for $pn in distinct-values( doc(”sp.xml”)//pno) let $sp:=doc(”sp.xml”)//sp_tuple[pno=$pn] where count($sp) >= 3 order by $pn return {$pn} {$pn} {avg($sp/price)} {avg($sp/price)}

32 SDPL 2007Course Review32 Course Main Message n XML is a universal way to represent information as tree-like data structures n Specialized and powerful technologies for processing it –Worst hype has settled –R&D still active


Download ppt "Structured-Document Processing Languages Spring 2007 Course Review Repetitio mater studiorum est!"

Similar presentations


Ads by Google