Download presentation
Presentation is loading. Please wait.
Published byCarmella Francis Modified over 8 years ago
1
Master Informatique 10/9/2007 1 Typing semistructured data Serge Abiteboul Web Data Management Typing semistructured data
2
Master Informatique 10/9/2007 2 Web Data Management Typing semistructured data Organization Motivations Automata – Automata on words – Ranked tree automata – Unranked tree automata – Automata and monadic second-order logic – Automata – to compute XML typing: DTD, XML schema Graphs and bisimulation
3
Master Informatique 10/9/2007 3 Motivation Typing semistructured data
4
Master Informatique 10/9/2007 4 Web Data Management Typing semistructured data XML typing Not compulsory Simplify writing software for XML – Improves interoperability between programs, consistency and efficiency Improve storage and performance Ease querying: data guide Simplify data protection – Reject illegal update – like relational dependencies
5
Master Informatique 10/9/2007 5 Improve storage Root Company Employee string company person works-for c.e.o. address name managed-by name Company Employee Store rest in overflow graph Lower-bound schema Typing semistructured data
6
Master Informatique 10/9/2007 6 Improve performance Bib paperbook year journal title intstring address author title zip city street last name first name string select X.title from Bib._ X where X.*.zip = “12345” select X.title from Bib._ X where X.*.zip = “12345” select X.title from Bib.book X where X.address.zip = “12345” select X.title from Bib.book X where X.address.zip = “12345” Typing semistructured data Cheaper to evaluate if schema is known No schema – first compute a data guide
7
Master Informatique 10/9/2007 7 Web Data Management Typing semistructured data Type checking Who checks – XML editor: check that the data conforms to its type – XML exchange, e.g., with Web service Server when delivering the data Client/application: when receiving it Dynamic verification: after the data is produced Static verification: verification of the program that generates the data – Verify that a program receiving data of the proper input type only generates data of the proper output type – More complicated
8
Master Informatique 10/9/2007 8 Web Data Management Typing semistructured data Static verification Input: input type T and code of function f – f is Xquery, Xpath, XSLT, etc. Verification of T’ – Is it true that d ╞ T, f(d) ╞ T’ ? – i.e. is it true that for all documents d valid against an input type T, f(d) is valid against some output type T’ Type inference – Find the smallest T’ such that d ╞ T, f(d) ╞ T’ – When no knowledge of input type, inference problem becomes: find smallest T’ such that f(d) ╞ T’ Verifying or inferring an output type for a program is in all generality an undecidable problem because of “joins”
9
Master Informatique 10/9/2007 9 Web Data Management Typing semistructured data Example for $p in doc("parts.xml“)//part[color=“red"] return $p/name/text() $p/desc/node() Result type (part (name (string) desc (any) )* If the type of parts.xml//part/desc is string (part (name (string) desc (string) )* Smallest regular language that describes the output of query
10
Master Informatique 10/9/2007 10 Web Data Management Typing semistructured data Difficulty for $X in Input, $Y in Input do { print ( } Input: Result: Problem: { b i i=n 2 for n ≥ 0 } cannot be described in XML schema There is no « best » result – b* – + b 2 b * – + b 2 + b 4 b * – + b 2 + b 4 + b 9 b * – …
11
Master Informatique 10/9/2007 11 Web Data Management Typing semistructured data Why tree automata? XML = unranked trees No theory for XML Rich theory for strings: Automata Extend to rich theory for ranked trees: Tree automata – Nice algorithms – Nice theorems – Can this carry to unranked trees and XML? Yes!
12
Master Informatique 10/9/2007 12 Web Data Management Typing semistructured data From strings to trees a b b a a b b a b b ab a b b a b b ab ab ab Word Binary tree… Unranked tree automata Finite State Ranked tree automatano bound on number of children Automata a bbb
13
Master Informatique 10/9/2007 13 Web Data Management Typing semistructured data Why not then use unranked tree automata? Missing practical gadgets Complexity of verification – Goal: typing at reasonable cost
14
Master Informatique 10/9/2007 14 Automata Automata on words Typing semistructured data
15
Master Informatique 10/9/2007 15 Finite state automata on words Alphabet State Initial state Accepting states Transitions Typing semistructured data
16
Master Informatique 10/9/2007 16 Web Data Management Typing semistructured data q0q0 Nondeterministic automaton: Example a b a a b - a b a - q0q0 q1q1 q0q0 q0q0 q1q1 q0q0 q1q1 q0q0 q0q0 q1q1 q0q0 q0q0 q2q2 q1q1 q0q0 KO OK
17
Master Informatique 10/9/2007 17 Web Data Management Typing semistructured data Deterministic – No transition – No alternative transitions such as Determinization – It is possible to obtain an equivalent deterministic automaton – State of new automaton = set of states of the original one – Possible exponential blow-up Minimization Limitations – cannot do – Context-free languages Essential tool – e.g., lexical analysis Reminder
18
Master Informatique 10/9/2007 18 Web Data Management Typing semistructured data Reminder (2) L(A) = set of words accepted by automata A Regular languages Can be described by regular expressions, e.g. a(b+c)*d using concatenation, union and Kleene closure Closed under complement Closed under union, intersection – Product automata with states (s,s’) where s is from A and s’ is from A’
19
Master Informatique 10/9/2007 19 Web Data Management Typing semistructured data Automata on words versus trees a bba a b b a b b ab a Left to right Right to left No difference Bottom upBottom up Top downTop down Differences
20
Master Informatique 10/9/2007 20 Automata Automata on ranked trees Typing semistructured data
21
Master Informatique 10/9/2007 21 Binary tree automata Parallel evaluation For leaves: For other nodes: a b b a b ab a Bottom upBottom up q q’ b q” q1q” q2 qqq’ Typing semistructured data
22
Master Informatique 10/9/2007 22 Web Data Management Typing semistructured data Bottom-up tree automata Bottom-up: if a node labeled a has its children in states q, q’ then the node moves nondeterministically to state r or r’ Accepts if the root is in some state in F Not deterministic if alternatives or -transitions:
23
Master Informatique 10/9/2007 23 Web Data Management Typing semistructured data Example: deterministic bottom-up
24
Master Informatique 10/9/2007 24 Web Data Management Typing semistructured data Boolean circuit evaluation v v v 1 v v 1 1 0 v 0 1 1 OK
25
Master Informatique 10/9/2007 25 Regular tree language = set of trees accepted by a bottom-up tree automaton Typing semistructured data
26
Master Informatique 10/9/2007 26 Web Data Management Typing semistructured data Regular tree languages The following are equivalent – L is a regular tree language – L is accepted by a nondeterministic bottom-up automaton – L is accepted by a deterministic bottom-up automaton – L is accepted by a nondeterministic top-down automaton Deterministic top-down is weaker
27
Master Informatique 10/9/2007 27 Web Data Management Typing semistructured data Top-down tree automata Top-down: if a node labeled a is in state q”, then its left child moves to state q (right to q’) Accepts if all leaves are in states in F when the root is in some given initial state q Not deterministic if
28
Master Informatique 10/9/2007 28 Web Data Management Typing semistructured data Why deterministic top-down is weaker? Consider the language – L = { f(a,b), f(b,a) } It can be accepted by a bottom-up TA – Exercise: write a BUTA A such that L = L(A) Suppose that B is a deterministic top-down TA with L = L(B) – Exercise: Show that B also accepts {f(a,a)} – A contradiction Fact: No deterministic top-down tree automata accepts L
29
Master Informatique 10/9/2007 29 Web Data Management Typing semistructured data Ranked trees automata: Properties Like for words Determinization Minimization Closed under – Complement – Intersection – Union
30
Master Informatique 10/9/2007 30 Web Data Management Typing semistructured data But… XML documents are unranked: book (intro,section*,conclusion)
31
Master Informatique 10/9/2007 31 Automata Automata on unranked tree Typing semistructured data
32
Master Informatique 10/9/2007 32 Web Data Management Typing semistructured data Unranked tree automata Issue: represent an infinite set of transitions Solution: a regular language
33
Master Informatique 10/9/2007 33 Web Data Management Typing semistructured data Rule: Meaning: if the states of the children of some node labeled a form a word in L(Q), this node moves to some state in {r 1,…,r m } Unranked tree automata (2)
34
Master Informatique 10/9/2007 34 Web Data Management Typing semistructured data Building on ranked trees a b b b b ab ab a b b b b ab ab Ranked tree: FirstChild-NextSibling F: encoding into a ranked tree F is a bijection F -1 : decoding
35
Master Informatique 10/9/2007 35 Web Data Management Typing semistructured data Building on bottom-up ranked trees (2) For each Unranked TA A, there is a Ranked TA accepting F(L(A)) For each Ranked TA A, there is an unranked TA accepting F -1 (L(A)) Both are easy to construct Consequence: Unranked TA are closed under union, intersection, complement
36
Master Informatique 10/9/2007 36 Web Data Management Typing semistructured data Determinization always possible for bottom-up Can we use the FirstChild-NextSibling encoding – No: it does not preserve determinism Determinization
37
Master Informatique 10/9/2007 37 Web Data Management Typing semistructured data Top-down? This is more delicate Transition (a,q)=A(a,q) – The state of the automata A(a,q) when reading the labels of the children of a node labeled a determines the states of the children of that node – Accepts if all the leaves are in accepting state
38
Master Informatique 10/9/2007 38 Web Data Management Typing semistructured data Boolean circuit evaluation v v v 1 v 1 1 01 0 v 1 1 1 1 10 0 v v v A tree is accepted if, for some possible run, the states of all leaves are final
39
Master Informatique 10/9/2007 39 Automata Automata and monadic second-order logic Typing semistructured data
40
Master Informatique 10/9/2007 40 Web Data Management Typing semistructured data Monadic second-order logic Representation of a tree as a logical structure E(1,2), E(1,3)… E(3,9) S(2,3), S(3,4), S(4,5)…S(8,9) a(1), a(4), a(8) b(2), b(3), b(5), b(6), b(7), b(9) a b b b b ab ab 1 6 342 789 5
41
Master Informatique 10/9/2007 41 Web Data Management Typing semistructured data Monadic second-order logic E(1,2), E(1,3)… E(3,9) S(2,3), S(3,4), S(4,5)…S(8,9) a(1), a(4), a(8) b(2), b(3), b(5), b(6), b(7), b(9) MSO syntax Set variable Quantification over a set variable
42
Master Informatique 10/9/2007 42 Web Data Management Typing semistructured data Example of MSO Each a node has a b-descendant This corresponds to the formula For each node x labeled a: each set X that ( ) contains x and that ( ) is closed under descendant, X contains some y labeled b
43
Master Informatique 10/9/2007 43 Web Data Management Typing semistructured data Bridge Theorem: for a set L of trees, the following are equivalent 1.L = L(A) for some bottom-up tree automata A i.e. L is definable with bottom-tree automata 2.L = {T | T satisfies } for some MSO formula i.e. L is definable in MSO
44
Master Informatique 10/9/2007 44 XML typing DTDs Typing semistructured data
45
Master Informatique 10/9/2007 45 Web Data Management Typing semistructured data DTD Describe the children of a node of a label a by a regular expression Bizarre syntax
46
Master Informatique 10/9/2007 46 Web Data Management Typing semistructured data DTD and deterministism Regular expressions in DTD should be deterministic – Complicated definition Intuition: the corresponding automata should be deterministic – (a+b)*a is not – When reading, one cannot tell whether it is an a from (a+b) or if it is the a of the end – (b*a)(b*a)* is an equivalent expression that is deterministic
47
Master Informatique 10/9/2007 47 Web Data Management Typing semistructured data Very efficient validation It suffices to verify for each node a that the word formed by the labels of its children is accepted by the finite state automata A a Possible to type check the document while scanning it, e.g. with SAX parser
48
Master Informatique 10/9/2007 48 Web Data Management Typing semistructured data Very efficient validation (2) a bc dd stu bc AaAa s’t’ d d AbAb s’ st t’ Accept u
49
Master Informatique 10/9/2007 49 Web Data Management Typing semistructured data Warning The previous example can be checked with a simple automata on words But not the following one The stack is needed for accepting … … n n
50
Master Informatique 10/9/2007 50 Web Data Management Typing semistructured data Some bad news for DTD Not closed under union DTD1… DTD2… L(DTD1) L(DTD2) cannot be described by a DTD but can be described easily by a tree automata – Problem with the type of ad that depends of its parent Also not closed under complement Limited expressive power
51
Master Informatique 10/9/2007 51 Web Data Management Typing semistructured data Car example continued The best DTD we can choose does not distinguish between ads for used and new cars – Car UsedNew BrandYearBrand “Renault”“2008”“BMW”
52
Master Informatique 10/9/2007 52 Web Data Management Typing semistructured data Decoupled types in XML schema Each type corresponds to a label, not conversely car: [car]( used + new )* used:[used] (ad1*) new: [new] (ad2*) ad1: [ad] (year, brand) ad2: [ad] (brand) The tags are in green; type names in blue Nice closure properties Many other « gadgets » in XML schemas
53
Master Informatique 10/9/2007 53 XML typing XML Schemas Typing semistructured data
54
Master Informatique 10/9/2007 54 Web Data Management Typing semistructured data XML Schema Often criticized & unnecessarily complicated Boosted by Web services Richer than DTD – decoupled types Deterministic top-down tree automata (close to) XML schemas are extensible Many other useful functionalities – Namespaces – Atomic types – Integrity constraints, etc.
55
Master Informatique 10/9/2007 55 Web Data Management Typing semistructured data An XML schema is an XML document Since it is an XML syntax, it can use XML tools – Editor – Type checker – Etc. The type of all XML schemas can be described with an XML schema
56
Master Informatique 10/9/2007 56 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetnamespace="http://www.net-language.com"> <xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> Typing semistructured data
57
Master Informatique 10/9/2007 57 Web Data Management Typing semistructured data Simple elements and atomic types Definition: with common types: xs:string; xs:decimal; xs:integer; xs:boolean; xs:date; xs:time Examples Instances of such elements Refsnes 34 1968-03-27
58
Master Informatique 10/9/2007 58 Web Data Management Typing semistructured data Attributs Definition: Example Instance of such attribute Smith
59
Master Informatique 10/9/2007 59 Web Data Management Typing semistructured data Complex elements Empty element Contains only other elements John Smith Contains only text Ice cream Contains both elements and text It happened on 03.03.99....
60
Master Informatique 10/9/2007 60 Web Data Management Typing semistructured data Restriction of simple elements Other restrictions: enumerated types, patterns, etc.
61
Master Informatique 10/9/2007 61 Web Data Management Typing semistructured data Restriction on complex elements
62
Master Informatique 10/9/2007 62 Possible to name a type Only the "employee" element can use the specified complex type ( indicates an order on child elements) Alternative Typing semistructured data
63
Master Informatique 10/9/2007 63 Web Data Management Typing semistructured data Other gadgets Import of types associated to a namespace – <import nameSpace = "http://..." schemaLocation = "http://..." /> Possible to include an existing schema – Possible to extend/redefine an existing schema –.... Extensions...
64
Master Informatique 10/9/2007 64 Web Data Management Typing semistructured data Example: a DTD <!ATTLIST EMAIL LANGUAGE (Western|Greek|Latin|Universal) "Western" ENCRYPTED CDATA #IMPLIED PRIORITY (NORMAL|LOW|HIGH) "NORMAL"> <!ATTLIST BCC HIDDEN CDATA #FIXED "TRUE">
65
Master Informatique 10/9/2007 65 Web Data Management Typing semistructured data The same in a variant of XML schema (more verbose) <Schema name="email" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <AttributeType name="language" dt:type="enumeration" dt:values="Western Greek Latin Universal" />
66
Master Informatique 10/9/2007 66 Web Data Management Typing semistructured data Where to place XML schemas Some bizarre restriction – Inside an element, no two types with the same tag Closer to DTDs than to tree automata Efficient type validation Tree automata Deterministic. top-down tree automata DTD XML schema
67
Master Informatique 10/9/2007 67 Web Data Management Typing semistructured data Exercise: coupled vs decoupled Write a realistic DTD1 for new cars – With make, model, engine… Write a realistic DTD2 for used cars – Also year, miles, zipcode Write an XML schema for L(DTD1) L(DTD2) – Using decoupled schema
68
Master Informatique 10/9/2007 68 Automata Automata to compute Typing semistructured data
69
Master Informatique 10/9/2007 69 Web Data Management Typing semistructured data Another use of automata: XPATH $x in //a/b a b aab ab b $x NFADFA (0)
70
Master Informatique 10/9/2007 70 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01)
71
Master Informatique 10/9/2007 71 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01)
72
Master Informatique 10/9/2007 72 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) (02) $x
73
Master Informatique 10/9/2007 73 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x
74
Master Informatique 10/9/2007 74 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x
75
Master Informatique 10/9/2007 75 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (01)
76
Master Informatique 10/9/2007 76 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x
77
Master Informatique 10/9/2007 77 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x
78
Master Informatique 10/9/2007 78 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x (01)
79
Master Informatique 10/9/2007 79 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x (01) (02) $x
80
Master Informatique 10/9/2007 80 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x (01) $x
81
Master Informatique 10/9/2007 81 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x (02) $x
82
Master Informatique 10/9/2007 82 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) (01) $x
83
Master Informatique 10/9/2007 83 Web Data Management Typing semistructured data Example: //a/b a b aab ab b $x NFADFA (0) $x
84
Master Informatique 10/9/2007 84 Web Data Management Typing semistructured data Determinization: exponential blow up //a/*/*/b Typing semistructured data
85
Master Informatique 10/9/2007 85 Web Data Management Typing semistructured data Proposal : k-pebble transducers stack [milo,suciu,vianu]
86
Master Informatique 10/9/2007 86 Web Data Management Typing semistructured data k-pebble transducers: result root a cba abab Capture a core aspect of Xquery but not the data management part
87
Master Informatique 10/9/2007 87 Graphs and bisimulation Typing semistructured data
88
Master Informatique 10/9/2007 88 Web Data Management Typing semistructured data Graph Graph semistructured data Graph simulation Graph bisimulation Data guides
89
Master Informatique 10/9/2007 89 Web Data Management Typing semistructured data Semistructured data = Labeled graph Possibly a root – in red &r &p8&p1&p2&p3&p4&p5&p6&p7 &c company employee worksfor manages managedby manages managedby
90
Master Informatique 10/9/2007 90 Web Data Management Typing semistructured data Rooted graph OEM = Object Exchange Model With ID-IDREF, XML is a graph model as well Labeled (rooted) graph (E,r) – Set N of edges – A finite ternary relation E N N Label – E(s,t,l) = there is an edge from s to t labeled l – r is a node in the graph
91
Master Informatique 10/9/2007 91 Web Data Management Typing semistructured data Equality revisited {1,2,2,1,5} = {1,2,5} – Ignores the order For trees, if we ignore the order of siblings and use a “set” semantics = a bc dd b dd a bc d
92
Master Informatique 10/9/2007 92 Web Data Management Typing semistructured data Simulation A simulation of (E,r) with (E’,r’) is a relation between the nodes of E and E’ such that 1. (r,r’) 2.if (s,s’) and E(s,t,l) for some l then there exists t’ with (t,t’) and E’(s’,t’,l’) (we simulate a move in E by a move in E’)
93
Master Informatique 10/9/2007 93 Web Data Management Typing semistructured data Bisimulation Given , E, E’, is a bisimulation if is a simulation of E with E’ and -1 is a simulation of E’ with E
94
Master Informatique 10/9/2007 94 Web Data Management Typing semistructured data Examples aa a d aa a d a a d G G’ G” They all have the same paths from the root bisimulation Not bisimulation
95
Master Informatique 10/9/2007 95 Web Data Management Typing semistructured data A more complex example of graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R
96
Master Informatique 10/9/2007 96 Web Data Management Typing semistructured data t1 Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R
97
Master Informatique 10/9/2007 97 Web Data Management Typing semistructured data t1 Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R
98
Master Informatique 10/9/2007 98 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R
99
Master Informatique 10/9/2007 99 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R R
100
Master Informatique 10/9/2007 100 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R
101
Master Informatique 10/9/2007 101 Web Data Management Typing semistructured data Graph bisimulation root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee t1 t2 programmer | statistician STRING _ employee projects R R
102
Master Informatique 10/9/2007 102 Web Data Management Typing semistructured data Computing bisimulation in ptime Start with = N N’ (for N, N’ the set of nodes) While there exists (x,x’) in that violate the definition of simulation, remove (x,x’) from This computes the maximal bisimulation in ptime (Note: this maximal bisimulation exists because is a bisimulation, and if 1, 2 are bisimulation, 1 2 is also one)
103
Master Informatique 10/9/2007 103 Web Data Management Typing semistructured data What does this have to do with typing? Take a very complex graph E How do you describe it? By a “smaller” graph T that is a bisimulation of E There may be several bisimulation with more and more details
104
Master Informatique 10/9/2007 104 Web Data Management Typing semistructured data Rough bisimulation Root &r Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby worksfor employee
105
Master Informatique 10/9/2007 105 Web Data Management Typing semistructured data More precise one Root &r Employees &p1,&p1,&p3,P4 &p5,&p6,&p7,&p8 Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby manages managedby worksfor
106
Master Informatique 10/9/2007 106 Web Data Management Typing semistructured data Other “typing”: data guide See the graph as an automata with root as the start symbol and only accepting states This graph accepts all the paths from the root Obtain an equivalent, minimal, deterministic automata – This is the data guide for the graph – It can be used for describing the data – It can be used to support Graphical Query Interfaces
107
Master Informatique 10/9/2007 107 Web Data Management Typing semistructured data Data guide {root} {c1} programmer {c2} statistician {p1,p2,p3,p4,p5, p6,p7,p8,p9} project {e1,e2,e3,e4} employee {p1,p3}{p2,p4} {p1,p3,p5,p7}{p4,p6}{p4} worksonleadsworksonleadsconsults {e1,e2}{e2,e3} {p1,p3,p5, p7,p9} {p2,p4, p6,p8} workson {p4,p9} leadsconsults employee root e2e3e4 e1 p1 p2 p3 p4 p5 p6 p7 p8 p9 "exercise""lecture""finance""adminstr.""PR""undergrad""grad""postgrad""web" leads worksonleadsworkson leads workson leads worksonconsults employee consults workson c1c2 programmerstatistician project workson employee Gives all the paths from the root Automata minimization
108
Master Informatique 10/9/2007 108 Web Data Management Typing semistructured data What you should remember Tree automata = theoretical foundation for XML Bottom-up tree automata are nice Top-down and determinism together limitations XML documents do not have to be typed Typing may be very useful for XML – In particular for software managing XML data DTD: simple but limited XML Schema: more expressive but still limited Graph data: bisimulation is the answer
109
Master Informatique 10/9/2007 109 Merci Typing semistructured data
110
Master Informatique 10/9/2007 110 Web Data Management Typing semistructured data Bibliography TATA: the book, Tree Automata Techniques and Applications, tata.gforge.inria.fr/ – The book on the topic and it is free XML schema, see http://w3.orghttp://w3.org http://www.w3schools.com/schema/
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.