Presentation is loading. Please wait.

Presentation is loading. Please wait.

XDuce Tabuchi Naoshi, M1, Yonelab.

Similar presentations

Presentation on theme: "XDuce Tabuchi Naoshi, M1, Yonelab."— Presentation transcript:

1 XDuce Tabuchi Naoshi, M1, Yonelab. (

2 Presentation Outline XDuce: Introduction Regular Expression Types Regular Expression Pattern Matching Algorithms for Pattern Matching Type Inference Conclusion / Future Works References Xperl(?)

3 XDuce: For What? A functional language for XML processing On the basis of  Regular Expression Types, and  Pattern Matching Statically Typed i.e. Outputs are statically checked against DTD-conformance etc.

4 Advantages (vs. “untyped”) “Untyped” XML processing: programs using DOM etc.  Little connection between program and XML schema  Validity can be checked only at run-time, if any

5 Advantages (vs. “embedding”) “Embedding” : mapping XML schema into language’s type system. e.g. (DTD) ↓ type person = Person of name * mail list * tel option (ML)

6 Advantages (vs. “embedding”) Embedding does not suit intuition in some cases. e.g. Intuitively… (name, mail*, tel?) <: (name, mail*, tel*) but not name * mail list * tel option <: name * mail list * tel list

7 Language Features (1/2) ML-like pattern matching e.g. match p with | person(name[n], (ms as mail*), tel[t]) -> (* case: p has a tel *) | person(name[n], (ms as mail*)) -> (* case: p has no tel *) …

8 Language Features (2/2) Type inference e.g. if type Person = person[name[String], mail*, tel[String]?] and p :: Person then match p with person[name[n], (ms as mail*)] ⇒ n :: String, ms :: mail* are inferred.

9 Applications Bookmarks (Mozilla bookmark extraction) Html2Latex Diff (diff for XML) All 300 – 350 lines.

10 Regular Expression Types Types are defined in regular expression form with labels  Concatanation, alternation, repetition as basic constructors  Labels correspond to elements of XML (person, name, mail, etc…)

11 Syntax of Types T::= () | X | l[T] | T, T(* concat. *) | T|T (* alt. *) | T*(* rep. *) where X : Type Variables l : Labels

12 Recursive Types Types can be (mutually) recursive. e.g. type Folder = Entry* type Entry = name[String], file[File] | name[String], folder[Folder]

13 Subtyping Meaning of subtypes is as usual: All values t of T are also values of T’ T <: T’ ⇔ t ∈ T ⇒ t ∈ T’

14 Subtagging Subtaggings are user-defined “ad-hoc” subtype relation between labels e.g. small tag is a special case of tag (in HTML)

15 Complexity of Subtyping Subtype relation (T <: T’) is equivalent to inclusion of CFGs: Undecidable! Need some restrictions on syntax (next slide…)

16 Well-formedness of Types Syntactic restriction on types to ensure “regularity” Recursive use of types can only occur  at the tail position of type definition, or  inside labels.

17 Well-formed Types: Examples type X = Int, Y type Y = String, X | () and type Z = String, lab[Z], String | () are well-formed, but type U = Int, U, String | () is not.

18 Complexity of Subtyping, again With well-formedness, checking subtype relation is:  Still EXPTIME-complete, but  acceptable in practical cases.

19 Pattern Matching Pattern match can also involve regular expression types. e.g. match p with | person[name[n], (ms as mail*), (t as tel?) -> …

20 Policies of Pattern Matching Pattern matching has two basic policies:  First-match (as in ML): only the first pattern matched is taken  Longest-match (as usual in regexp. matching on string): matching is done as much as possible

21 First-match: Example (* p = person[name[…], mail, tel[…]] *) match p with | person(name[n], (ms as mail*), tel[t]) -> (* invoked *) | person(name[n], (ms as mail*), (tl as tel?) -> (* not invoked *)

22 Longest-match: Example (* p = person[name mail, mail, tel] *) match p with | … (m1 as mail*), (m2 as mail*), … -> (* m1 = mail, mail m2 = () *)

23 Exhaustiveness and Redundancy Pattern matches are checked against exhaustiveness and redundancy.  Exhaustiveness: No “omission” of values  Redundancy: Never-matched patterns

24 Exhaustiveness A pattern match P 1 -> e 1 | … | P n -> e n is exhaustive (wrt. input type T) ⇔ All values t ∈ T are matched by some P i or T <: P 1 | … | P n

25 Exhaustiveness: Example (1/2) match p with | person[name[n], (ms as mail*), tel[t]] ->... | person[name[n], (ms as mail*)] ->... is exhaustive patterns (wrt. Person)

26 Exhaustiveness: Example (2/2) match p with | person[name[n], (ms as mail*), tel[t]] ->... | person[name[n], (ms as mail+)] ->... is NOT exhaustive (wrt. Person): person[name[...]] does not match

27 Redundancy A pattern P i is redundant in P 1 -> e 1 | … | P n -> e n (wrt. input type T) ⇔ All values matched by P i is matched by P 1 |... | P i-1

28 Redundancy: Example match p with | person[name[n], (ms as mail*), (tl as tel?)] ->... | person[name[n], (ms as mail*)] ->... Second pattern is redundant: anything match second pattern also match first one.

29 Algorithms for Pattern Matching Pattern matching takes following steps  Translation of values into internal forms (binary trees)  Translation of types and patterns into internal forms (binary trees and tree automata)  Values are matched by patterns, in terms of tree automata

30 Internal Forms of Values Values are represented as binary trees internally t::=ε(* leaves) | l(t, t)(* labels *) First node is content of the label, second is remainder of the sequence.

31 Internal Forms of Values: Example person[name[], mail[], mail[]] is translated into person(name(ε, mail(ε, mail(ε, ε))), ε)

32 Internal Forms of Types Types are also translated into binary trees T::=φ(* empty *) | ε (* leaves *) | T|T | l(X, X) X is States, used in tree automata

33 Internal Forms of Types: Tree Automata A tree automaton M is a mapping of States -> Types e.g. M(X) = name(Y, Z) M(Y) = ε M(Z) = mail(Y, Z) | ε...

34 Internal Forms of Types: Example type Person = person[name[], mail*, tel[]?] is translated into  binary tree: person(X1, X0) and  tree automaton M, s.t. M(X0) = ε M(X1) = name(X0, X2), M(X2) = mail(X0, X2) | mail(X0, X3) | ε M(X3) = tel(X0, X0)

35 Internal Forms of Patterns Patterns are similar to types, with some additions P::=(* same as types... *) | x : P(* x as P *) | T (* wildcard *) Wildcards are used for non “as”-ed variables

36 Internal Forms of Patterns: Example Pattern person[name[n], (ms as mail*)] is translated into binary tree person(Y1, Y0) and tree automaton N, s.t. N(Y0) = ε N(Y1) = name(n: T, ms:Y2) N(Y2) = mail(Y0, Y2) | ε

37 Pattern Matching (1/3) Pattern matching has two roles  match input values (of course!)  bind variables to components of input value, if matched Written formally t ∈ D ⇒ V “t is matched by D, yielding V” (V : Vars -> Values)

38 Pattern Matching (2/3) Matching relation t ∈ D ⇒ V is defined by following rules... (next slide) Assumptions:  D is a set of patterns and states  A tree automaton N is implied  (D, N) corresponds to the external pattern

39 Pattern Matching (3/3)

40 Type Inference (1/2) Infer types of variables in patterns Results are exact types of variables Type of each variable depends on  pattern itself, and  type of input

41 Type Inference (2/2) Type inference is “flow-sensitive” In P 1 -> e 1 | … | P n -> e n, inference on P i depends on P 1... P i-1 Because…  Values matched by P i are those NOT matched by P 1... P i-1

42 Type Inference: Example (1/2) (* p :: person[name[], mail*, tel[]?] *) match p with | person[name[], rest] -> … Type of rest inferred is mail*, tel[]? In this case

43 Type Inference: Example (2/2) match p with | person[name[], tel[]] -> … | person[name[], rest] -> … Type of rest becomes (mail+, tel[]?) | () In this case, because… person[name[], (), tel[]] Is matched by the first pattern

44 Type Inference: Limitations “Exact” type inference is possible only on  Variables at tail position, or  Inside labels (c.f. well-formedness) Limitation comes from internal representation of patterns (binary trees)

45 Conclusion Expressiveness of regular expression types/pattern matching are useful for XML processing. Type inference (including subtype relation) is possible and efficient (in most practical cases).

46 Future Works Precise type inference on all variables Introducing Any type: Not possible by naïve way  Breaks closure-property of tree automata  Makes type inference impossible

47 References Regular Expression Pattern Matching for XML: Hosoya and Pierce Regular Expression Types for XML: Hosoya, Vouillon, and Pierce Available @

48 Xperl(?) My own current research Regular expression types for Perl Motivation: Scripting languages  are used more widely  will live longer than XML

49 Features (in mind) Regular expression (but not tree) types Infer outputs of scripts, etc. Detect possible run-time errors

50 Progress Report (1/3) Parsing: Nightmare! ASTs can be extracted through debug interface, fortunately :-p

51 Progress Report (2/3) Semantics: No specification but implementation Trying from scratch, step by step Queer, esp. around side-effects and data structures First attempt in the world?

52 Progress Report (3/3) Type System: Working along with semantics Types are regular expressions: τ ::= ε|α| ττ | τ|τ | τ* … Preliminary implementation of inference Still VERY trivial...

53 Resources No documentations yet. Working note is placed @ AS-IS.

Download ppt "XDuce Tabuchi Naoshi, M1, Yonelab."

Similar presentations

Ads by Google