Presentation is loading. Please wait.

Presentation is loading. Please wait.

Master Informatique 10/9/2007 1 Typing semistructured data Serge Abiteboul Web Data Management Typing semistructured data.

Similar presentations


Presentation on theme: "Master Informatique 10/9/2007 1 Typing semistructured data Serge Abiteboul Web Data Management Typing semistructured data."— Presentation transcript:

1 Master Informatique 10/9/2007 1 Typing semistructured data Serge Abiteboul Web Data Management Typing semistructured data

2 Master Informatique 10/9/2007 2 Web Data Management Typing semistructured data Organization Motivations Automata – Automata on words – Ranked tree automata – Unranked tree automata – Automata and monadic second-order logic – Automata – to compute XML typing: DTD, XML schema Graphs and bisimulation

3 Master Informatique 10/9/2007 3 Motivation Typing semistructured data

4 Master Informatique 10/9/2007 4 Web Data Management Typing semistructured data XML typing Not compulsory Simplify writing software for XML – Improves interoperability between programs, consistency and efficiency Improve storage and performance Ease querying: data guide Simplify data protection – Reject illegal update – like relational dependencies

5 Master Informatique 10/9/2007 5 Improve storage Root Company Employee string company person works-for c.e.o. address name managed-by name Company Employee Store rest in overflow graph Lower-bound schema Typing semistructured data

6 Master Informatique 10/9/2007 6 Improve performance Bib paperbook year journal title intstring address author title zip city street last name first name string select X.title from Bib._ X where X.*.zip = “12345” select X.title from Bib._ X where X.*.zip = “12345” select X.title from Bib.book X where X.address.zip = “12345” select X.title from Bib.book X where X.address.zip = “12345” Typing semistructured data Cheaper to evaluate if schema is known No schema – first compute a data guide

7 Master Informatique 10/9/2007 7 Web Data Management Typing semistructured data Type checking Who checks – XML editor: check that the data conforms to its type – XML exchange, e.g., with Web service Server when delivering the data Client/application: when receiving it Dynamic verification: after the data is produced Static verification: verification of the program that generates the data – Verify that a program receiving data of the proper input type only generates data of the proper output type – More complicated

8 Master Informatique 10/9/2007 8 Web Data Management Typing semistructured data Static verification Input: input type T and code of function f – f is Xquery, Xpath, XSLT, etc. Verification of T’ – Is it true that  d ╞ T, f(d) ╞ T’ ? – i.e. is it true that for all documents d valid against an input type T, f(d) is valid against some output type T’ Type inference – Find the smallest T’ such that  d ╞ T, f(d) ╞ T’ – When no knowledge of input type, inference problem becomes: find smallest T’ such that f(d) ╞ T’ Verifying or inferring an output type for a program is in all generality an undecidable problem because of “joins”

9 Master Informatique 10/9/2007 9 Web Data Management Typing semistructured data Example for $p in doc("parts.xml“)//part[color=“red"] return $p/name/text() $p/desc/node() Result type (part (name (string) desc (any) )* If the type of parts.xml//part/desc is string (part (name (string) desc (string) )* Smallest regular language that describes the output of query

10 Master Informatique 10/9/2007 10 Web Data Management Typing semistructured data Difficulty for $X in Input, $Y in Input do { print ( } Input: Result: Problem: { b i  i=n 2 for n ≥ 0 } cannot be described in XML schema There is no « best » result – b* –  + b 2 b * –  + b 2 + b 4 b * –  + b 2 + b 4 + b 9 b * – …

11 Master Informatique 10/9/2007 11 Web Data Management Typing semistructured data Why tree automata? XML = unranked trees No theory for XML Rich theory for strings: Automata Extend to rich theory for ranked trees: Tree automata – Nice algorithms – Nice theorems – Can this carry to unranked trees and XML? Yes!

12 Master Informatique 10/9/2007 12 Web Data Management Typing semistructured data From strings to trees a b b a a b b a b b ab a b b a b b ab ab ab Word Binary tree… Unranked tree automata Finite State Ranked tree automatano bound on number of children Automata a bbb

13 Master Informatique 10/9/2007 13 Web Data Management Typing semistructured data Why not then use unranked tree automata? Missing practical gadgets Complexity of verification – Goal: typing at reasonable cost

14 Master Informatique 10/9/2007 14 Automata Automata on words Typing semistructured data

15 Master Informatique 10/9/2007 15 Finite state automata on words Alphabet State Initial state Accepting states Transitions Typing semistructured data

16 Master Informatique 10/9/2007 16 Web Data Management Typing semistructured data q0q0 Nondeterministic automaton: Example a b a a b - a b a - q0q0 q1q1 q0q0 q0q0 q1q1 q0q0 q1q1 q0q0 q0q0 q1q1 q0q0 q0q0 q2q2 q1q1 q0q0 KO OK

17 Master Informatique 10/9/2007 17 Web Data Management Typing semistructured data Deterministic – No  transition – No alternative transitions such as Determinization – It is possible to obtain an equivalent deterministic automaton – State of new automaton = set of states of the original one – Possible exponential blow-up Minimization Limitations – cannot do – Context-free languages Essential tool – e.g., lexical analysis Reminder

18 Master Informatique 10/9/2007 18 Web Data Management Typing semistructured data Reminder (2) L(A) = set of words accepted by automata A Regular languages Can be described by regular expressions, e.g. a(b+c)*d using concatenation, union and Kleene closure Closed under complement Closed under union, intersection – Product automata with states (s,s’) where s is from A and s’ is from A’

19 Master Informatique 10/9/2007 19 Web Data Management Typing semistructured data Automata on words versus trees a bba a b b a b b ab a Left to right Right to left No difference Bottom upBottom up Top downTop down Differences

20 Master Informatique 10/9/2007 20 Automata Automata on ranked trees Typing semistructured data

21 Master Informatique 10/9/2007 21 Binary tree automata Parallel evaluation For leaves: For other nodes: a b b a b ab a Bottom upBottom up q q’ b q” q1q” q2 qqq’ Typing semistructured data

22 Master Informatique 10/9/2007 22 Web Data Management Typing semistructured data Bottom-up tree automata Bottom-up: if a node labeled a has its children in states q, q’ then the node moves nondeterministically to state r or r’ Accepts if the root is in some state in F Not deterministic if alternatives or  -transitions:

23 Master Informatique 10/9/2007 23 Web Data Management Typing semistructured data Example: deterministic bottom-up

24 Master Informatique 10/9/2007 24 Web Data Management Typing semistructured data Boolean circuit evaluation v v v 1 v v 1 1 0 v 0 1 1 OK

25 Master Informatique 10/9/2007 25 Regular tree language = set of trees accepted by a bottom-up tree automaton Typing semistructured data

26 Master Informatique 10/9/2007 26 Web Data Management Typing semistructured data Regular tree languages The following are equivalent – L is a regular tree language – L is accepted by a nondeterministic bottom-up automaton – L is accepted by a deterministic bottom-up automaton – L is accepted by a nondeterministic top-down automaton Deterministic top-down is weaker

27 Master Informatique 10/9/2007 27 Web Data Management Typing semistructured data Top-down tree automata Top-down: if a node labeled a is in state q”, then its left child moves to state q (right to q’) Accepts if all leaves are in states in F when the root is in some given initial state q Not deterministic if

28 Master Informatique 10/9/2007 28 Web Data Management Typing semistructured data Why deterministic top-down is weaker? Consider the language – L = { f(a,b), f(b,a) } It can be accepted by a bottom-up TA – Exercise: write a BUTA A such that L = L(A) Suppose that B is a deterministic top-down TA with L = L(B) – Exercise: Show that B also accepts {f(a,a)} – A contradiction Fact: No deterministic top-down tree automata accepts L

29 Master Informatique 10/9/2007 29 Web Data Management Typing semistructured data Ranked trees automata: Properties Like for words Determinization Minimization Closed under – Complement – Intersection – Union

30 Master Informatique 10/9/2007 30 Web Data Management Typing semistructured data But… XML documents are unranked: book (intro,section*,conclusion)

31 Master Informatique 10/9/2007 31 Automata Automata on unranked tree Typing semistructured data

32 Master Informatique 10/9/2007 32 Web Data Management Typing semistructured data Unranked tree automata Issue: represent an infinite set of transitions Solution: a regular language

33 Master Informatique 10/9/2007 33 Web Data Management Typing semistructured data Rule: Meaning: if the states of the children of some node labeled a form a word in L(Q), this node moves to some state in {r 1,…,r m } Unranked tree automata (2)

34 Master Informatique 10/9/2007 34 Web Data Management Typing semistructured data Building on ranked trees a b b b b ab ab a b b b b ab ab Ranked tree: FirstChild-NextSibling F: encoding into a ranked tree F is a bijection F -1 : decoding

35 Master Informatique 10/9/2007 35 Web Data Management Typing semistructured data Building on bottom-up ranked trees (2) For each Unranked TA A, there is a Ranked TA accepting F(L(A)) For each Ranked TA A, there is an unranked TA accepting F -1 (L(A)) Both are easy to construct Consequence: Unranked TA are closed under union, intersection, complement

36 Master Informatique 10/9/2007 36 Web Data Management Typing semistructured data Determinization always possible for bottom-up Can we use the FirstChild-NextSibling encoding – No: it does not preserve determinism Determinization

37 Master Informatique 10/9/2007 37 Web Data Management Typing semistructured data Top-down? This is more delicate Transition  (a,q)=A(a,q) – The state of the automata A(a,q) when reading the labels of the children of a node labeled a determines the states of the children of that node – Accepts if all the leaves are in accepting state

38 Master Informatique 10/9/2007 38 Web Data Management Typing semistructured data Boolean circuit evaluation v v v 1 v 1 1 01 0 v 1 1 1 1 10 0 v v v A tree is accepted if, for some possible run, the states of all leaves are final

39 Master Informatique 10/9/2007 39 Automata Automata and monadic second-order logic Typing semistructured data

40 Master Informatique 10/9/2007 40 Web Data Management Typing semistructured data Monadic second-order logic Representation of a tree as a logical structure E(1,2), E(1,3)… E(3,9) S(2,3), S(3,4), S(4,5)…S(8,9) a(1), a(4), a(8) b(2), b(3), b(5), b(6), b(7), b(9) a b b b b ab ab 1 6 342 789 5

41 Master Informatique 10/9/2007 41 Web Data Management Typing semistructured data Monadic second-order logic E(1,2), E(1,3)… E(3,9) S(2,3), S(3,4), S(4,5)…S(8,9) a(1), a(4), a(8) b(2), b(3), b(5), b(6), b(7), b(9) MSO syntax Set variable Quantification over a set variable

42 Master Informatique 10/9/2007 42 Web Data Management Typing semistructured data Example of MSO Each a node has a b-descendant This corresponds to the formula For each node x labeled a: each set X that (  )  contains x and that (  ) is closed under descendant, X contains some y labeled b

43 Master Informatique 10/9/2007 43 Web Data Management Typing semistructured data Bridge Theorem: for a set L of trees, the following are equivalent 1.L = L(A) for some bottom-up tree automata A i.e. L is definable with bottom-tree automata 2.L = {T | T satisfies  } for some MSO formula  i.e. L is definable in MSO

44 Master Informatique 10/9/2007 44 XML typing DTDs Typing semistructured data

45 Master Informatique 10/9/2007 45 Web Data Management Typing semistructured data DTD Describe the children of a node of a label a by a regular expression Bizarre syntax

46 Master Informatique 10/9/2007 46 Web Data Management Typing semistructured data DTD and deterministism Regular expressions in DTD should be deterministic – Complicated definition Intuition: the corresponding automata should be deterministic – (a+b)*a is not – When reading, one cannot tell whether it is an a from (a+b) or if it is the a of the end – (b*a)(b*a)* is an equivalent expression that is deterministic

47 Master Informatique 10/9/2007 47 Web Data Management Typing semistructured data Very efficient validation It suffices to verify for each node a that the word formed by the labels of its children is accepted by the finite state automata A a Possible to type check the document while scanning it, e.g. with SAX parser

48 Master Informatique 10/9/2007 48 Web Data Management Typing semistructured data Very efficient validation (2) a bc dd stu bc AaAa s’t’ d d AbAb s’ st t’ Accept u

49 Master Informatique 10/9/2007 49 Web Data Management Typing semistructured data Warning The previous example can be checked with a simple automata on words But not the following one The stack is needed for accepting … … n n

50 Master Informatique 10/9/2007 50 Web Data Management Typing semistructured data Some bad news for DTD Not closed under union DTD1… DTD2… L(DTD1)  L(DTD2) cannot be described by a DTD but can be described easily by a tree automata – Problem with the type of ad that depends of its parent Also not closed under complement Limited expressive power

51 Master Informatique 10/9/2007 51 Web Data Management Typing semistructured data Car example continued The best DTD we can choose does not distinguish between ads for used and new cars – Car UsedNew BrandYearBrand “Renault”“2008”“BMW”

52 Master Informatique 10/9/2007 52 Web Data Management Typing semistructured data Decoupled types in XML schema Each type corresponds to a label, not conversely car: [car]( used + new )* used:[used] (ad1*) new: [new] (ad2*) ad1: [ad] (year, brand) ad2: [ad] (brand) The tags are in green; type names in blue Nice closure properties Many other « gadgets » in XML schemas

53 Master Informatique 10/9/2007 53 XML typing XML Schemas Typing semistructured data

54 Master Informatique 10/9/2007 54 Web Data Management Typing semistructured data XML Schema Often criticized & unnecessarily complicated Boosted by Web services Richer than DTD – decoupled types Deterministic top-down tree automata (close to) XML schemas are extensible Many other useful functionalities – Namespaces – Atomic types – Integrity constraints, etc.

55 Master Informatique 10/9/2007 55 Web Data Management Typing semistructured data An XML schema is an XML document Since it is an XML syntax, it can use XML tools – Editor – Type checker – Etc. The type of all XML schemas can be described with an XML schema

56 Master Informatique 10/9/2007 56 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetnamespace="http://www.net-language.com"> <xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> Typing semistructured data

57 Master Informatique 10/9/2007 57 Web Data Management Typing semistructured data Simple elements and atomic types Definition: with common types: xs:string; xs:decimal; xs:integer; xs:boolean; xs:date; xs:time Examples Instances of such elements Refsnes 34 1968-03-27

58 Master Informatique 10/9/2007 58 Web Data Management Typing semistructured data Attributs Definition: Example Instance of such attribute Smith

59 Master Informatique 10/9/2007 59 Web Data Management Typing semistructured data Complex elements Empty element Contains only other elements John Smith Contains only text Ice cream Contains both elements and text It happened on 03.03.99....

60 Master Informatique 10/9/2007 60 Web Data Management Typing semistructured data Restriction of simple elements Other restrictions: enumerated types, patterns, etc.

61 Master Informatique 10/9/2007 61 Web Data Management Typing semistructured data Restriction on complex elements

62 Master Informatique 10/9/2007 62 Possible to name a type Only the "employee" element can use the specified complex type ( indicates an order on child elements) Alternative Typing semistructured data

63 Master Informatique 10/9/2007 63 Web Data Management Typing semistructured data Other gadgets Import of types associated to a namespace – <import nameSpace = "http://..." schemaLocation = "http://..." /> Possible to include an existing schema – Possible to extend/redefine an existing schema –.... Extensions...

64 Master Informatique 10/9/2007 64 Web Data Management Typing semistructured data Example: a DTD <!ATTLIST EMAIL LANGUAGE (Western|Greek|Latin|Universal) "Western" ENCRYPTED CDATA #IMPLIED PRIORITY (NORMAL|LOW|HIGH) "NORMAL"> <!ATTLIST BCC HIDDEN CDATA #FIXED "TRUE">

65 Master Informatique 10/9/2007 65 Web Data Management Typing semistructured data The same in a variant of XML schema (more verbose) <Schema name="email" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <AttributeType name="language" dt:type="enumeration" dt:values="Western Greek Latin Universal" />

66 Master Informatique 10/9/2007 66 Web Data Management Typing semistructured data Where to place XML schemas Some bizarre restriction – Inside an element, no two types with the same tag Closer to DTDs than to tree automata Efficient type validation Tree automata Deterministic. top-down tree automata DTD XML schema

67 Master Informatique 10/9/2007 67 Web Data Management Typing semistructured data Exercise: coupled vs decoupled Write a realistic DTD1 for new cars – With make, model, engine… Write a realistic DTD2 for used cars – Also year, miles, zipcode Write an XML schema for L(DTD1)  L(DTD2) – Using decoupled schema

68 Master Informatique 10/9/2007 68 Automata Automata to compute Typing semistructured data

69 Master Informatique 10/9/2007 69 Web Data Management Typing semistructured data Another use of automata: XPATH $x in //a/b a b aab ab b $x NFADFA (0)

70 Master Informatique 10/9/2007 70 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01)

71 Master Informatique 10/9/2007 71 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01)

72 Master Informatique 10/9/2007 72 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) (02) $x

73 Master Informatique 10/9/2007 73 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x

74 Master Informatique 10/9/2007 74 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x

75 Master Informatique 10/9/2007 75 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (01)

76 Master Informatique 10/9/2007 76 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x

77 Master Informatique 10/9/2007 77 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x

78 Master Informatique 10/9/2007 78 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x (01)

79 Master Informatique 10/9/2007 79 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x (01) (02) $x

80 Master Informatique 10/9/2007 80 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x (01) $x

81 Master Informatique 10/9/2007 81 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x

82 Master Informatique 10/9/2007 82 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x

83 Master Informatique 10/9/2007 83 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) $x

84 Master Informatique 10/9/2007 84 Web Data Management Typing semistructured data Determinization: exponential blow up //a/*/*/b Typing semistructured data

85 Master Informatique 10/9/2007 85 Web Data Management Typing semistructured data Proposal : k-pebble transducers stack [milo,suciu,vianu]

86 Master Informatique 10/9/2007 86 Web Data Management Typing semistructured data k-pebble transducers: result root a cba abab Capture a core aspect of Xquery but not the data management part

87 Master Informatique 10/9/2007 87 Graphs and bisimulation Typing semistructured data

88 Master Informatique 10/9/2007 88 Web Data Management Typing semistructured data Graph Graph semistructured data Graph simulation Graph bisimulation Data guides

89 Master Informatique 10/9/2007 89 Web Data Management Typing semistructured data Semistructured data = Labeled graph Possibly a root – in red &r &p8&p1&p2&p3&p4&p5&p6&p7 &c company employee worksfor manages managedby manages managedby

90 Master Informatique 10/9/2007 90 Web Data Management Typing semistructured data Rooted graph OEM = Object Exchange Model With ID-IDREF, XML is a graph model as well Labeled (rooted) graph (E,r) – Set N of edges – A finite ternary relation E  N  N  Label – E(s,t,l) = there is an edge from s to t labeled l – r is a node in the graph

91 Master Informatique 10/9/2007 91 Web Data Management Typing semistructured data Equality revisited {1,2,2,1,5} = {1,2,5} – Ignores the order For trees, if we ignore the order of siblings and use a “set” semantics = a bc dd b dd a bc d

92 Master Informatique 10/9/2007 92 Web Data Management Typing semistructured data Simulation A simulation  of (E,r) with (E’,r’) is a relation between the nodes of E and E’ such that 1.  (r,r’) 2.if  (s,s’) and E(s,t,l) for some l then there exists t’ with  (t,t’) and E’(s’,t’,l’) (we simulate a move in E by a move in E’)

93 Master Informatique 10/9/2007 93 Web Data Management Typing semistructured data Bisimulation Given , E, E’,  is a bisimulation if  is a simulation of E with E’ and  -1 is a simulation of E’ with E

94 Master Informatique 10/9/2007 94 Web Data Management Typing semistructured data Examples aa a d aa a d a a d G G’ G” They all have the same paths from the root bisimulation Not bisimulation

95 Master Informatique 10/9/2007 95 Web Data Management Typing semistructured data A more complex example of graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R

96 Master Informatique 10/9/2007 96 Web Data Management Typing semistructured data t1 Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R

97 Master Informatique 10/9/2007 97 Web Data Management Typing semistructured data t1 Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R

98 Master Informatique 10/9/2007 98 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R

99 Master Informatique 10/9/2007 99 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R R

100 Master Informatique 10/9/2007 100 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R

101 Master Informatique 10/9/2007 101 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R R

102 Master Informatique 10/9/2007 102 Web Data Management Typing semistructured data Computing bisimulation in ptime Start with  = N  N’ (for N, N’ the set of nodes) While there exists (x,x’) in  that violate the definition of simulation, remove (x,x’) from  This computes the maximal bisimulation in ptime (Note: this maximal bisimulation exists because  is a bisimulation, and if  1,  2 are bisimulation,  1   2 is also one)

103 Master Informatique 10/9/2007 103 Web Data Management Typing semistructured data What does this have to do with typing? Take a very complex graph E How do you describe it? By a “smaller” graph T that is a bisimulation of E There may be several bisimulation with more and more details

104 Master Informatique 10/9/2007 104 Web Data Management Typing semistructured data Rough bisimulation Root &r Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby worksfor employee

105 Master Informatique 10/9/2007 105 Web Data Management Typing semistructured data More precise one Root &r Employees &p1,&p1,&p3,P4 &p5,&p6,&p7,&p8 Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby manages managedby worksfor

106 Master Informatique 10/9/2007 106 Web Data Management Typing semistructured data Other “typing”: data guide See the graph as an automata with root as the start symbol and only accepting states This graph accepts all the paths from the root Obtain an equivalent, minimal, deterministic automata – This is the data guide for the graph – It can be used for describing the data – It can be used to support Graphical Query Interfaces

107 Master Informatique 10/9/2007 107 Web Data Management Typing semistructured data Data guide {root} {c1} programmer {c2} statistician {p1,p2,p3,p4,p5, p6,p7,p8,p9} project {e1,e2,e3,e4} employee {p1,p3}{p2,p4} {p1,p3,p5,p7}{p4,p6}{p4} worksonleadsworksonleadsconsults {e1,e2}{e2,e3} {p1,p3,p5, p7,p9} {p2,p4, p6,p8} workson {p4,p9} leadsconsults employee root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee Gives all the paths from the root Automata minimization

108 Master Informatique 10/9/2007 108 Web Data Management Typing semistructured data What you should remember Tree automata = theoretical foundation for XML Bottom-up tree automata are nice Top-down and determinism together  limitations XML documents do not have to be typed Typing may be very useful for XML – In particular for software managing XML data DTD: simple but limited XML Schema: more expressive but still limited Graph data: bisimulation is the answer

109 Master Informatique 10/9/2007 109 Merci Typing semistructured data

110 Master Informatique 10/9/2007 110 Web Data Management Typing semistructured data Bibliography TATA: the book, Tree Automata Techniques and Applications, tata.gforge.inria.fr/ – The book on the topic and it is free XML schema, see http://w3.orghttp://w3.org http://www.w3schools.com/schema/


Download ppt "Master Informatique 10/9/2007 1 Typing semistructured data Serge Abiteboul Web Data Management Typing semistructured data."

Similar presentations


Ads by Google