Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi

Similar presentations


Presentation on theme: "1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi"— Presentation transcript:

1 1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi Naoshi(tabee@yl.is.s.u-tokyo.ac.jp)

2 2 Presentation Outline Introduction (pronounced “transduce”) Programming in XDuce.  Values  Regular Expression Types  Subtyping Pattern matching. Conclusions.

3 3 XDuce: What For? A functional language for XML processing. On the basis of  Regular Expression Types  Pattern Matching Statically Typed i.e. Outputs are statically checked against DTD-conformance etc.

4 4 Advantages (vs. “untyped”) “Untyped” XML processing: programs using DOM etc.  Little connection between program and XML schema.  Validity can be checked only at run-time, if any.

5 5 Advantages (vs. “embedding”) “Embedding” : mapping XML schema into language’s type system. e.g. (DTD) type person = name * mail list * tel option (ML)

6 6 Advantages (vs. “embedding”) Embedding does not suit intuition in some cases. e.g. Intuitively… (name,mail*,tel?) <:(name,mail*,tel*) but not name * mail list * tel option <: name * mail list * tel list (ML)

7 7 Values Values are XML Documents (input, output, intermediate). Syntax:  XDuce’s native syntax.  Standard XML syntax document.

8 8 Values(cont.) Standard XML syntax: Haruo Hosoya hahosoya@kyoto-u hahosoya@upenn Benjamin Pierce bcpierce@upenn 123-456-789  let val doc = load_xml(“mybook.xml”)

9 9 Values(cont.) XDuce’s native syntax: let val mybook = addrbook[ person[ name["Haruo Hosoya"], email["hahosoya@kyoto-u"], email["hahosoya@upenn"]], person[ name["Benjamin Pierce"], email["bcpierce@upenn"], tel["123-456-789"]]] Constructor labal[…] where … is sequence of other values. String enclosed in double-quotes, unlike XML

10 10 Regular Expression Types Types are defined in regular expression form with labels.  Concatenation, union, alteration as basic constructors.  Labels correspond to elements of XML (person, name, mail, etc…).

11 11 Regular Expression Types (cont.) Example: type Addrbook = addrbook[(Name, Addr, Tel?)*] type Name = name[String] type Addr = addr[String] type Tel = tel[String] Correspond DTD: Types not labels

12 12 Syntax of Types: T ::= ()| X | L[T] | T,T (* concat. *) | T|T (* alter. *) | T* (* rep. *) where X : Type Variable (String, Int…) L : Label Regular Expression Types (cont.) Empty sequence type

13 13 Regular Expression Types (cont.) Syntactic sugar:  T+ ≡ T,T*  T? ≡ T|() Types can be (mutually) recursive: type Folder = Entry* type Entry = name[String], file[String] | name[String], folder[Folder]

14 14 Regular Expression Types (cont.) Syntax of Labels: L ::= l (* specific label *) | ~ (* wildcard label *) | L|L (* union *) | L\L (* difference *)

15 15 Regular Expression Types (cont.) The label class ~ represents the set of all labels.  We can define a type Any type Any =(~[Any] |Int |Float |String)* Labels Union type Heading = (h1|h2|h3|h4|h5|h6)[Inline] (HTML headings)

16 16 Subtyping Meaning of subtypes is as usual: All values t of T are also values of T’ T <: T’ ⇔ ∀ t ∈ T ⇒ t ∈ T’ Examples:  Name,Addr <: Name,Addr,Tel?  Name,Addr,Tel <: Name,Addr,Tel?  addrbook[Name,Addr,Name,Addr,Tel] <: addrbook[(Name,Addr,Tel?)*]

17 17 Subtyping - Union Types Union (or alternation) type constructor |. Example:  Name <: Name | Tel  Tel <: Name | Tel Forget ordering  (Name,Addr)*,(Name,Tel)* <: ((Name,Addr)|(Name,Tel))* Distributivity  (Name,Tel)|(Name,Addr) <: Name,(Addr|Tel)

18 18 Subtyping - Subtagging Allowing subtyping between types with different labels. (beyond the expressive power of DTD) e.g. (HTML) subtag i <: fontstyle subtag b <: fontstyle  i[T] <: fontstyle[T] b[T] <: fontstyle[T]

19 19 Complexity of Subtyping Subtype relation (T <: T’) is equivalent to inclusion CFGs  Undecidable!  Need some restrictions on syntax. (next slide…)

20 20 Well-formedness of Types Syntactic restriction on types to ensure “regularity”. Recursive use of types can only occur  at the tail position of type definition, or  inside labels.

21 21 Well-formed Types: Examples type X = Int, Y type Y = String, X | () and type Z = String, lab[Z], String |() are well-formed, but type U = Int, U, String |() is not.

22 22 Complexity of Subtyping, again With well-formedness, checking subtype relation is:  Still EXPTIME-complete, equivalent to inclusion of tree automata [CDG+] but  acceptable in practical cases.

23 23 Pattern matching (cont.) ML-like pattern matching: “pattern -> expression” Example: val url = match v with www[val s as String] -> "http:// “ ^ s | email[val s as String] -> "mailto:" ^ s | ftp[val s as String] -> "ftp://" ^ s

24 24 Pattern matching (cont.) Pattern match can also involve regular expression types. e.g. match p with | person[name[String],(val ms as Mail*), (val t as Tel?)] -> …

25 25 Pattern matching (cont.) Functions – reusable pattern matching. Example: fun make_url(val s as String): String = match s with www[val s as String] -> "http://" ^ s | email[val s as String] -> "mailto:" ^ s | ftp[val s as String] -> "ftp://" ^ s

26 26 Policies of Pattern Matching Pattern matching has two basic policies:  First-match (as in ML): only the first pattern matched is taken.  Longest-match (as usual in regexp. matching on string): matching is done as much as possible.

27 27 First-match: Example (* p = person[name, mail, tel] *) match p with | person[Name, (val ms as Mail*), Tel] -> (* invoked *) | person[Name, (val ms as Mail*), Tel?] -> (* not invoked *)

28 28 Longest-match: Example (* p = person[name, mail, mail, tel] *) match p with | … (val m1 as Mail*),(val m2 as Mail*), … -> (* m1 = mail, mail m2 = () *)

29 29 Exhaustiveness and Redundancy Pattern matches are checked against exhaustiveness and redundancy.  Exhaustiveness: No “omission” of values.  Redundancy: Never-matched patterns.

30 30 Exhaustiveness A pattern match P 1 -> e 1 | … | P n -> e n is exhaustive (wrt. input type T) ⇔ All values t ∈ T are matched by some P i or T <: P 1 | … | P n

31 31 Exhaustiveness: Example (1/2) (* type Person = person[Name, Mail*, Tel?] *) match p with | person[Name, Mail*, Tel] ->... | person[Name, Mail*] ->... is exhaustive patterns (wrt. Person)

32 32 Exhaustiveness: Example (2/2) (* type Person = person[Name, Mail*, Tel?] *) match p with | person[Name, Mail*, Tel] ->... | person[Name, Mail+] ->... is NOT exhaustive (wrt. Person): person[name[...]] does not match

33 33 Redundancy A pattern P i is redundant in P 1 -> e 1 | … | P n -> e n (wrt. input type T) ⇔ All values matched by P i is matched by P 1 |... | P i-1

34 34 Redundancy: Example (* type Person = person[Name, Mail*, Tel?] *) match p with | person[name, Mail*, tel?] ->... | person[name, Mail*)] ->... Second pattern is redundant: anything match second pattern also match first one.

35 35 Complete Example (1/3) type Addrbook = addrbook[Person*] type Person = person[Name,Email*,Tel?] type Name = name[String] type Email = email[String] type Tel = tel[String] (* and output documents. *) type TelBook = telbook[TelPerson*] type TelPerson = person[Name,Tel] (* load an address book *) let val doc = load_xml("mybook.xml")

36 36 Complete Example (2/3) (* validate it against the type Addrbook *) let val valid_doc = validate doc with Addrbook (* extract the content of the top label addrbook *) let val out_doc = match valid_doc with addrbook[val persons as Person*] -> telbook[make_tel_book(persons)] (* save out_doc to out.xml*) save_xml("output.xml")(out_doc)

37 37 Complete Example (3/3) (* take ps of type Person* and return TelPerson* *) fun make_tel_book (val ps as Person*) : TelPerson* = match ps with person[name[val n as String], Email*, tel[val t as String]],val rest as Person* -> person[name[n], tel[t]], make_tel_book(rest) | person[name[val n as String], Email*], val rest as Person* -> make_tel_book(rest) | () -> () Recursive call

38 38 Conclusion Expressiveness of regular expression types/pattern matching are useful for XML processing. Type inference (including subtype relation) is possible and efficient (in most practical cases). (Appendix 2)

39 39 Applications Bookmarks (Mozilla bookmark extraction). Html2Latex. Diff (diff for XML). All 300 – 350 lines.

40 40 Future Works Precise type inference on all variables. Introducing Any type: Not possible by naïve way.  Breaks closure-property of tree automata.  Makes type inference impossible.

41 41 References XDuce: A statically Type XML Processing: Hosoya and Pierce XDuce: A typed XML Processing Language: Hosoya and Pierce Regular Expression Pattern Matching for XML: Hosoya and Pierce Regular Expression Types for XML: Hosoya, Vouillon, and Pierce Available @ http://xduce.sourceforge.net http://xduce.sourceforge.net

42 42 Appendix 1: Type Inference

43 43 Type Inference (1/2) Infer types of variables in patterns Results are exact types of variables Type of each variable depends on  pattern itself, and  type of input

44 44 Type Inference (2/2) Type inference is “flow-sensitive” In P 1 -> e 1 | … | P n -> e n, inference on P i depends on P 1... P i-1 Because...  Values matched by P i are those NOT matched by P 1... P i-1

45 45 Type Inference: Example (1/2) (* p :: person[name[], mail*, tel[]?] *) match p with | person[name[], rest] -> … Type of rest is inferred mail*, tel[]? In this case

46 46 Type Inference: Example (2/2) match p with | person[name[], tel[]] -> … | person[name[], rest] -> … Type of rest becomes (mail+, tel[]?) | () In this case, because… person[name[], (), tel[]] Is matched by the first pattern.

47 47 Type Inference: Limitations “Exact” type inference is possible only on  Variables at tail position, or  Inside labels (c.f. well-formedness) Limitation comes from internal representation of patterns (binary trees)

48 48 Appendix 2: Algorithms for Pattern Matching

49 49 Algorithms for Pattern Matching Pattern matching takes following steps  Translation of values into internal forms (binary trees).  Translation of types and patterns into internal forms (binary trees and tree automata).  Values are matched by patterns, in terms of tree automata.

50 50 Internal Forms of Values Values are represented as binary trees internally: t::=ε(* leaves *) | l(t, t)(* labels *) First node is content of the label, second is remainder of the sequence.

51 51 Internal Forms of Values: Example person[name[], mail[], mail[]] is translated into person(name(ε,mail(ε,mail(ε,ε))),ε)

52 52 Internal Forms of Types Types are also translated into binary trees T::=φ(* empty *) | ε (* leaves *) | T|T (* union *) | l(X, X) (* label *) X is States, used in tree automata

53 53 Internal Forms of Types: Tree Automata A tree automaton M is a mapping of States -> Types e.g. M(X) = name(Y, Z) M(Y) = ε M(Z) = mail(Y, Z) | ε...

54 54 Internal Forms of Types: Example type Person = person[name[], mail*, tel[]?] is translated into  binary tree: person(X1, X0) and  tree automaton M, s.t. M(X0) = ε M(X1) = name(X0, X2), M(X2) = mail(X0, X2) | mail(X0, X3) | ε M(X3) = tel(X0, X0)

55 55 Internal Forms of Patterns Patterns are similar to types, with some additions P::=(* same as types... *) | x : P(* x as P *) | T (* wildcard *) Wildcards are used for non “as”-ed variables.

56 56 Internal Forms of Patterns: Example Pattern person[name[n], (ms as mail*)] is translated into binary tree person(Y1, Y0) and tree automaton N, s.t. N(Y0) = ε N(Y1) = name(n:T, ms:Y2) N(Y2) = mail(Y0, Y2) | ε

57 57 Pattern Matching (1/3) Pattern matching has two roles  match input values (of course!)  bind variables to components of input value, if matched Written formally t ∈ D ⇒ V “t is matched by D, yielding V” (V : Vars -> Values)

58 58 Pattern Matching (2/3) Matching relation t ∈ D ⇒ V is defined by following rules... (next slide) Assumptions:  D is a set of patterns and states  A tree automaton N is implied  (D, N) corresponds to the external pattern

59 59 Pattern Matching (3/3)


Download ppt "1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi"

Similar presentations


Ads by Google