Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured-Document Processing Languages (5 cp), Spring 2011 Pekka Kilpeläinen University of Eastern Finland School of Computing

Similar presentations


Presentation on theme: "Structured-Document Processing Languages (5 cp), Spring 2011 Pekka Kilpeläinen University of Eastern Finland School of Computing"— Presentation transcript:

1 Structured-Document Processing Languages (5 cp), Spring 2011 Pekka Kilpeläinen University of Eastern Finland School of Computing Pekka.T.Kilpelainen@uef.fi SDPL 20111 Notes 1: Introduction

2 SDPL 2011 Notes 1: Introduction 2 Introduction First: Overview and Arrangements What is this course about? 1.1 Review of Structured-Documents 2.1 XML & XML Documents

3 SDPL 2011 Notes 1: Introduction 3 Goals of the Course n To get familiar with central models and languages for –Manipulating –transforming and –querying structured documents (or XML) n “Generic XML processing technology” –very little about specific XML applications, or commercial systems

4 SDPL 2011 Notes 1: Introduction 4 NOT an Exhaustive Survey n Bias in selecting course topics: –estimated usefulness/value »centrality (→ longer-lasting value) »maturity: Standards, or stable specifications? Robust implementations? –Lecturer up-to-date? n Emphasis on programmatic access and manipulation of data in the form of documents, rather than describing/modelling it

5 SDPL 2011 Notes 1: Introduction 5Motivation? n Practical relevance of documents and XML n Interest in models of information processing XML Internet / intranet e.g. order e.g. invoice “Programming for XML is very different from programming for other data models” - D. Florescu & D. Kossmann at SIGMOD’06

6 SDPL 2011 Notes 1: Introduction 6 Course Outline 1 Introduction Overview and Arrangements 1.1 Structured Documents 2 XML Basics - XML documents, DTDs, Namespaces 3 Programmatic Manipulation of Structured Documents (XML APIs) - SAX, DOM, StAX, JAXP

7 SDPL 2011 Notes 1: Introduction 7 Course Outline (2) 4 Transforming Structured Documents 4.1 Addressing: XPath 1.0 4.2 XSLT 1.0 5 Querying Structured Documents - W3C XML Query Language XQuery 6 Review of the Course 6 Review of the Course

8 SDPL 2011 Notes 1: Introduction 8 Methodological Goals n Central professional skills –consulting technical specifications –experimenting with SW implementations n Ability to think…? –to find out relationships, reason, explain,... –to apply knowledge in new situations n ("Pidgin English" for scientific communication)

9 SDPL 2011 Notes 1: Introduction 9 Administration n An elective Master-level special course –suitable for all specialisation lines (esp. media technology and SWE) n 5 cp (  133 h 20 min of work!) n Lectures March 9 – May 5, class MT3 –Video broadcast to TB180 in Joensuu –Lecturer: Pekka.T.Kilpelainen@uef.fi Pekka.T.Kilpelainen@uef.fi NB NB!

10 SDPL 2011 Notes 1: Introduction 10 Administration: Exercises n Exercises, March 14 – May 9 –essential for learning the technology –normal homework assignments, hands-on practice; Solutions discussed in class –Grading: one prepared solution -> one activity point n Plan: –some solutions shall be returned to Moodle, for potential feedback and scaling of activity points by solution quality –These will be announced on a case-by-case basis

11 SDPL 2011 Notes 1: Introduction 11 Administration: Grading n Final exam on Friday, May 13 –≥ 50% of exam points required to pass n Grade = round(6*Exam/MaxExam + 2*HomeWork/MaxHomeWork - 2.5) n Retake exam on Friday June 10 –again  50% to pass –better of grades with/without homework credits

12 SDPL 2011 Notes 1: Introduction 12 Material n No single textbook n Reports, specifications, articles n Course home page –http://www.cs.uku.fi/~kilpelai/RDK11/ http://www.cs.uku.fi/~kilpelai/RDK11/ –slides, exercises, references, announcements n Possible background texts (See home page): –Deitel et al; Møller & Schwartzbach; Key; Walmsley

13 SDPL 2011 Notes 1: Introduction 13 Background Check n Basic knowledge of structured documents and document standards –Course ”Introduction to Document standards"? n Programming languages and concepts –Java? OO programming? –Functional programming? SQL? –Unix/Linux vs. Windows? n Formal language theory –Theory of Computation –regular expressions, context-free grammars, parse trees?

14 SDPL 2011 Notes 1: Introduction 14 Students' Expectations?

15 SDPL 2011 Notes 1: Introduction 15 1.1. Structured Documents n Document: –a structured representation of information on some medium (  message) –normally for a human reader »memos, manuals, articles, books, … –also application-to-application messages »e.g., btw client and server in Web Services –"prose-oriented XML" vs "data-oriented XML" –can be treated as a unit »e.g., a web page vs a web site?

16 SDPL 2011 Notes 1: Introduction 16 Text-Based Documents n We concentrate on textual or text-based documents –character data major constituent of information content –as opposed to, say multimedia documents n Next: Presentation vs Structure

17 SDPL 2011 Notes 1: Introduction 17 Presentation vs Structure n Presentation informs the human reader about the meaning of text and the role of its parts n Markup (merkkaus) indicates the presentation or the meaning of different parts of text –originally hand-written annotations for the typesetter –nowadays primarily codes embedded in digital documents; –nowadays primarily codes embedded in digital documents;

18 SDPL 2011 Notes 1: Introduction 18 Markup n Procedural markup –formatting commands (start boldface, produce an empty line, indent 5 mm,...) –proprietary word processor formats, nroff, TeX,... n Descriptive or generic markup –indicating the logical structure of text using chosen names –LaTeX: \begin{abstract}... \end{abstract} –HTML:.... –HTML:.... n Markup language (merkkauskieli) –a fixed set of markup notations (e.g. nroff, TeX, HTML, SVG, …)

19 SDPL 2011 Notes 1: Introduction 19 Structured Documents? Most liberally, any document is structured (punctuation, words, sentences, fields, …) but especially descriptively marked-up documents... (e.g. well-formed XML) especially if they adhere to a rigorous specification of structure (e.g. XML+DTD)

20 SDPL 2011 Notes 1: Introduction 20 Structure in Documents n Hierarchy or nesting is ubiquitous –chapters of books, warnings in maintenance manuals,... n Linear order essential in prose documents –less important in representations of data objects n Hypertext and cross-references n We'll be mainly dealing with manipulation of hierarchical, or tree-like document structures Next: How are these modelled?


Download ppt "Structured-Document Processing Languages (5 cp), Spring 2011 Pekka Kilpeläinen University of Eastern Finland School of Computing"

Similar presentations


Ads by Google