1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

Modern Programming Languages, 2nd ed.
Type Checking, Inference, & Elaboration CS153: Compilers Greg Morrisett.
XDuce Tabuchi Naoshi, M1, Yonelab.
CSE341: Programming Languages Lecture 2 Functions, Pairs, Lists Dan Grossman Winter 2013.
ML Datatypes.1 Standard ML Data types. ML Datatypes.2 Concrete Datatypes  The datatype declaration creates new types  These are concrete data types,
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
An Introduction to XML Based on the W3C XML Recommendations.
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Getting started with ML ML is a functional programming language. ML is statically typed: The types of literals, values, expressions and functions in a.
ML: a quasi-functional language with strong typing Conventional syntax: - val x = 5; (*user input *) val x = 5: int (*system response*) - fun len lis =
1 XML DTD & XML Schema Monica Farrow G30
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
XMλ. Contents What is the problem? Hosoya’s approach Shields’ approach XMLambda and the UHConclusion.
Lecture 3: Closure Properties & Regular Expressions Jim Hook Tim Sheard Portland State University.
ML: a quasi-functional language with strong typing Conventional syntax: - val x = 5; (*user input *) val x = 5: int (*system response*) - fun len lis =
PZ02A - Language translation
Cse321, Programming Languages and Compilers 1 6/19/2015 Lecture #18, March 14, 2007 Syntax directed translations, Meanings of programs, Rules for writing.
Context-Free Grammars Lecture 7
Introduction to ML - Part 2 Kenny Zhu. What is next? ML has a rich set of structured values Tuples: (17, true, “stuff”) Records: {name = “george”, age.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Normal forms for Context-Free Grammars
Functional programming: LISP Originally developed for symbolic computing First interactive, interpreted language Dynamic typing: values have types, variables.
1 Type Type system for a programming language = –set of types AND – rules that specify how a typed program is allowed to behave Why? –to generate better.
A Type System for a Semistructured and XML Data Base Management System Ph. D. Thesis Proposal Dario Colazzo.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
ML Datatypes.1 Standard ML Data types. ML Datatypes.2 Concrete Datatypes  The datatype declaration creates new types  These are concrete data types,
1 Querying and storing XML Week 7 Typechecking and Static Analysis March 5-8, 2013.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
An OO schema language for XML SOX W3C Note 30 July 1999.
A Second Look At ML 1. Outline Patterns Local variable definitions A sorting example 2.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
Chapter 3 Part II Describing Syntax and Semantics.
XML Access Control Koukis Dimitris Padeleris Pashalis.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
Error Example - 65/4; ! Toplevel input: ! 65/4; ! ^^ ! Type clash: expression of type ! int ! cannot have type ! real.
Chapter SevenModern Programming Languages1 A Second Look At ML.
CS412/413 Introduction to Compilers Radu Rugina Lecture 13 : Static Semantics 18 Feb 02.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1.SML Docs Standard Basis 2.First-Class Functions Anonymous Style Points Higher-Order 3.Examples Agenda.
1 Objective Caml (Ocaml) Aaron Bloomfield CS 415 Fall 2005.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
ML: a quasi-functional language with strong typing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Agenda SML Docs First-Class Functions Examples Standard Basis
Agenda SML Docs First-Class Functions Examples Standard Basis
CSE 341 Section 3 Nick Mooney Spring 2017.
DTD (Document Type Definition)
Course Overview PART I: overview material PART II: inside a compiler
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lecture 5 Scanning.
COMPILER CONSTRUCTION
Presentation transcript:

1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi

2 Presentation Outline Introduction (pronounced “transduce”) Programming in XDuce.  Values  Regular Expression Types  Subtyping Pattern matching. Conclusions.

3 XDuce: What For? A functional language for XML processing. On the basis of  Regular Expression Types  Pattern Matching Statically Typed i.e. Outputs are statically checked against DTD-conformance etc.

4 Advantages (vs. “untyped”) “Untyped” XML processing: programs using DOM etc.  Little connection between program and XML schema.  Validity can be checked only at run-time, if any.

5 Advantages (vs. “embedding”) “Embedding” : mapping XML schema into language’s type system. e.g. (DTD) type person = name * mail list * tel option (ML)

6 Advantages (vs. “embedding”) Embedding does not suit intuition in some cases. e.g. Intuitively… (name,mail*,tel?) <:(name,mail*,tel*) but not name * mail list * tel option <: name * mail list * tel list (ML)

7 Values Values are XML Documents (input, output, intermediate). Syntax:  XDuce’s native syntax.  Standard XML syntax document.

8 Values(cont.) Standard XML syntax: Haruo Hosoya Benjamin Pierce  let val doc = load_xml(“mybook.xml”)

9 Values(cont.) XDuce’s native syntax: let val mybook = addrbook[ person[ name["Haruo Hosoya"], person[ name["Benjamin Pierce"], tel[" "]]] Constructor labal[…] where … is sequence of other values. String enclosed in double-quotes, unlike XML

10 Regular Expression Types Types are defined in regular expression form with labels.  Concatenation, union, alteration as basic constructors.  Labels correspond to elements of XML (person, name, mail, etc…).

11 Regular Expression Types (cont.) Example: type Addrbook = addrbook[(Name, Addr, Tel?)*] type Name = name[String] type Addr = addr[String] type Tel = tel[String] Correspond DTD: Types not labels

12 Syntax of Types: T ::= ()| X | L[T] | T,T (* concat. *) | T|T (* alter. *) | T* (* rep. *) where X : Type Variable (String, Int…) L : Label Regular Expression Types (cont.) Empty sequence type

13 Regular Expression Types (cont.) Syntactic sugar:  T+ ≡ T,T*  T? ≡ T|() Types can be (mutually) recursive: type Folder = Entry* type Entry = name[String], file[String] | name[String], folder[Folder]

14 Regular Expression Types (cont.) Syntax of Labels: L ::= l (* specific label *) | ~ (* wildcard label *) | L|L (* union *) | L\L (* difference *)

15 Regular Expression Types (cont.) The label class ~ represents the set of all labels.  We can define a type Any type Any =(~[Any] |Int |Float |String)* Labels Union type Heading = (h1|h2|h3|h4|h5|h6)[Inline] (HTML headings)

16 Subtyping Meaning of subtypes is as usual: All values t of T are also values of T’ T <: T’ ⇔ ∀ t ∈ T ⇒ t ∈ T’ Examples:  Name,Addr <: Name,Addr,Tel?  Name,Addr,Tel <: Name,Addr,Tel?  addrbook[Name,Addr,Name,Addr,Tel] <: addrbook[(Name,Addr,Tel?)*]

17 Subtyping - Union Types Union (or alternation) type constructor |. Example:  Name <: Name | Tel  Tel <: Name | Tel Forget ordering  (Name,Addr)*,(Name,Tel)* <: ((Name,Addr)|(Name,Tel))* Distributivity  (Name,Tel)|(Name,Addr) <: Name,(Addr|Tel)

18 Subtyping - Subtagging Allowing subtyping between types with different labels. (beyond the expressive power of DTD) e.g. (HTML) subtag i <: fontstyle subtag b <: fontstyle  i[T] <: fontstyle[T] b[T] <: fontstyle[T]

19 Complexity of Subtyping Subtype relation (T <: T’) is equivalent to inclusion CFGs  Undecidable!  Need some restrictions on syntax. (next slide…)

20 Well-formedness of Types Syntactic restriction on types to ensure “regularity”. Recursive use of types can only occur  at the tail position of type definition, or  inside labels.

21 Well-formed Types: Examples type X = Int, Y type Y = String, X | () and type Z = String, lab[Z], String |() are well-formed, but type U = Int, U, String |() is not.

22 Complexity of Subtyping, again With well-formedness, checking subtype relation is:  Still EXPTIME-complete, equivalent to inclusion of tree automata [CDG+] but  acceptable in practical cases.

23 Pattern matching (cont.) ML-like pattern matching: “pattern -> expression” Example: val url = match v with www[val s as String] -> " “ ^ s | [val s as String] -> "mailto:" ^ s | ftp[val s as String] -> "ftp://" ^ s

24 Pattern matching (cont.) Pattern match can also involve regular expression types. e.g. match p with | person[name[String],(val ms as Mail*), (val t as Tel?)] -> …

25 Pattern matching (cont.) Functions – reusable pattern matching. Example: fun make_url(val s as String): String = match s with www[val s as String] -> " ^ s | [val s as String] -> "mailto:" ^ s | ftp[val s as String] -> "ftp://" ^ s

26 Policies of Pattern Matching Pattern matching has two basic policies:  First-match (as in ML): only the first pattern matched is taken.  Longest-match (as usual in regexp. matching on string): matching is done as much as possible.

27 First-match: Example (* p = person[name, mail, tel] *) match p with | person[Name, (val ms as Mail*), Tel] -> (* invoked *) | person[Name, (val ms as Mail*), Tel?] -> (* not invoked *)

28 Longest-match: Example (* p = person[name, mail, mail, tel] *) match p with | … (val m1 as Mail*),(val m2 as Mail*), … -> (* m1 = mail, mail m2 = () *)

29 Exhaustiveness and Redundancy Pattern matches are checked against exhaustiveness and redundancy.  Exhaustiveness: No “omission” of values.  Redundancy: Never-matched patterns.

30 Exhaustiveness A pattern match P 1 -> e 1 | … | P n -> e n is exhaustive (wrt. input type T) ⇔ All values t ∈ T are matched by some P i or T <: P 1 | … | P n

31 Exhaustiveness: Example (1/2) (* type Person = person[Name, Mail*, Tel?] *) match p with | person[Name, Mail*, Tel] ->... | person[Name, Mail*] ->... is exhaustive patterns (wrt. Person)

32 Exhaustiveness: Example (2/2) (* type Person = person[Name, Mail*, Tel?] *) match p with | person[Name, Mail*, Tel] ->... | person[Name, Mail+] ->... is NOT exhaustive (wrt. Person): person[name[...]] does not match

33 Redundancy A pattern P i is redundant in P 1 -> e 1 | … | P n -> e n (wrt. input type T) ⇔ All values matched by P i is matched by P 1 |... | P i-1

34 Redundancy: Example (* type Person = person[Name, Mail*, Tel?] *) match p with | person[name, Mail*, tel?] ->... | person[name, Mail*)] ->... Second pattern is redundant: anything match second pattern also match first one.

35 Complete Example (1/3) type Addrbook = addrbook[Person*] type Person = person[Name, *,Tel?] type Name = name[String] type = [String] type Tel = tel[String] (* and output documents. *) type TelBook = telbook[TelPerson*] type TelPerson = person[Name,Tel] (* load an address book *) let val doc = load_xml("mybook.xml")

36 Complete Example (2/3) (* validate it against the type Addrbook *) let val valid_doc = validate doc with Addrbook (* extract the content of the top label addrbook *) let val out_doc = match valid_doc with addrbook[val persons as Person*] -> telbook[make_tel_book(persons)] (* save out_doc to out.xml*) save_xml("output.xml")(out_doc)

37 Complete Example (3/3) (* take ps of type Person* and return TelPerson* *) fun make_tel_book (val ps as Person*) : TelPerson* = match ps with person[name[val n as String], *, tel[val t as String]],val rest as Person* -> person[name[n], tel[t]], make_tel_book(rest) | person[name[val n as String], *], val rest as Person* -> make_tel_book(rest) | () -> () Recursive call

38 Conclusion Expressiveness of regular expression types/pattern matching are useful for XML processing. Type inference (including subtype relation) is possible and efficient (in most practical cases). (Appendix 2)

39 Applications Bookmarks (Mozilla bookmark extraction). Html2Latex. Diff (diff for XML). All 300 – 350 lines.

40 Future Works Precise type inference on all variables. Introducing Any type: Not possible by naïve way.  Breaks closure-property of tree automata.  Makes type inference impossible.

41 References XDuce: A statically Type XML Processing: Hosoya and Pierce XDuce: A typed XML Processing Language: Hosoya and Pierce Regular Expression Pattern Matching for XML: Hosoya and Pierce Regular Expression Types for XML: Hosoya, Vouillon, and Pierce

42 Appendix 1: Type Inference

43 Type Inference (1/2) Infer types of variables in patterns Results are exact types of variables Type of each variable depends on  pattern itself, and  type of input

44 Type Inference (2/2) Type inference is “flow-sensitive” In P 1 -> e 1 | … | P n -> e n, inference on P i depends on P 1... P i-1 Because...  Values matched by P i are those NOT matched by P 1... P i-1

45 Type Inference: Example (1/2) (* p :: person[name[], mail*, tel[]?] *) match p with | person[name[], rest] -> … Type of rest is inferred mail*, tel[]? In this case

46 Type Inference: Example (2/2) match p with | person[name[], tel[]] -> … | person[name[], rest] -> … Type of rest becomes (mail+, tel[]?) | () In this case, because… person[name[], (), tel[]] Is matched by the first pattern.

47 Type Inference: Limitations “Exact” type inference is possible only on  Variables at tail position, or  Inside labels (c.f. well-formedness) Limitation comes from internal representation of patterns (binary trees)

48 Appendix 2: Algorithms for Pattern Matching

49 Algorithms for Pattern Matching Pattern matching takes following steps  Translation of values into internal forms (binary trees).  Translation of types and patterns into internal forms (binary trees and tree automata).  Values are matched by patterns, in terms of tree automata.

50 Internal Forms of Values Values are represented as binary trees internally: t::=ε(* leaves *) | l(t, t)(* labels *) First node is content of the label, second is remainder of the sequence.

51 Internal Forms of Values: Example person[name[], mail[], mail[]] is translated into person(name(ε,mail(ε,mail(ε,ε))),ε)

52 Internal Forms of Types Types are also translated into binary trees T::=φ(* empty *) | ε (* leaves *) | T|T (* union *) | l(X, X) (* label *) X is States, used in tree automata

53 Internal Forms of Types: Tree Automata A tree automaton M is a mapping of States -> Types e.g. M(X) = name(Y, Z) M(Y) = ε M(Z) = mail(Y, Z) | ε...

54 Internal Forms of Types: Example type Person = person[name[], mail*, tel[]?] is translated into  binary tree: person(X1, X0) and  tree automaton M, s.t. M(X0) = ε M(X1) = name(X0, X2), M(X2) = mail(X0, X2) | mail(X0, X3) | ε M(X3) = tel(X0, X0)

55 Internal Forms of Patterns Patterns are similar to types, with some additions P::=(* same as types... *) | x : P(* x as P *) | T (* wildcard *) Wildcards are used for non “as”-ed variables.

56 Internal Forms of Patterns: Example Pattern person[name[n], (ms as mail*)] is translated into binary tree person(Y1, Y0) and tree automaton N, s.t. N(Y0) = ε N(Y1) = name(n:T, ms:Y2) N(Y2) = mail(Y0, Y2) | ε

57 Pattern Matching (1/3) Pattern matching has two roles  match input values (of course!)  bind variables to components of input value, if matched Written formally t ∈ D ⇒ V “t is matched by D, yielding V” (V : Vars -> Values)

58 Pattern Matching (2/3) Matching relation t ∈ D ⇒ V is defined by following rules... (next slide) Assumptions:  D is a set of patterns and states  A tree automaton N is implied  (D, N) corresponds to the external pattern

59 Pattern Matching (3/3)