Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.

Similar presentations


Presentation on theme: "Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001."— Presentation transcript:

1 Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001

2 Overview Schema Extraction for SS data Schemas for XML –DTDs –XML Schema

3 Review of Schemas so far Upper bound schema S Tell us what labels are allowed Conformance test: D  S In practice: need deterministic schemas Lower bound schema S Tells us what labels are required Conformance test: S  D Alternative formulation: datalog programs, maximal fixpoint

4 Schema Extraction (From Data) Problem statement given data instance D find the “most specific” schema S for D In practice: S too large, need to relax [Nestorov, Abiteboul, Motwani 1998]

5 Schema Extraction: Sample Data &r &p8&p1&p2&p3&p4&p5&p6&p7 &c company employee worksfor manages managedby manages managedby Example database D =

6 Lower Bound Schema Extraction [NAM’98] approach: Start with the schema given by the data (S = D): –Each node = a predicate = a class Compute maximal fixpoint (PTIME) Declare two classes equal iff they are equal sets –E.g. p4={&p1,&p4,&p6}, p6={&p1,&p4,&p6}, hence p1=p4 Equivalently, p=p’ iff p(&p’) and p’(&p)... p4(x) :- link(x, manages, y), p5(y), link(x, worksfor, z), c(z) p5(x) :- link(x, managed-by, y), p4(y), link(x, worksfor, z), c(z)... p4(x) :- link(x, manages, y), p5(y), link(x, worksfor, z), c(z) p5(x) :- link(x, managed-by, y), p4(y), link(x, worksfor, z), c(z)...

7 Lower Bound Schema Extraction Root &r Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby worksfor employee Result S =

8 Lower Bound Schema Extraction Equivalently: Compute the maximal simulation D  D –Can do in time O(m 2 ) Two nodes p, p’ are equivalent iff x  x’ and x’  x Schema consists of equivalence classes Remark: could use the bisimulation relation instead (perhaps is even better)

9 Upper Bound Schema Extraction The extracted lower bound schema S is also an upper bound schema ! But: nondeterministic Convert S  S d Alternatively, convert directly D  D d = S d –These are data guides [McHugh and Widom]

10 Upper Bound Schema Extraction Root &r Employees &p1,&p1,&p3,P4 &p5,&p6,&p7,&p8 Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby manages managedby worksfor Result S d =

11 XML Document Type Definitions part of the original XML specification an XML document may have a DTD terminology for XML: –well-formed: if tags are correctly closed –valid: if it has a DTD and conforms to it validation is useful in data exchange

12 Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

13 Very Simple DTD 123456789 John B432 1234 987654321 Jim B123... 123456789 John B432 1234 987654321 Jim B123... Example of valid XML document:

14 Content Model Element content: what we can put in an element (aka content model) Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* (i.e. very restrictied)

15 Attributes in DTDs..............

16 Attributes in DTDs <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>....... <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>.......

17 Attributes in DTDs Types: CDATA = string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday | Wednesday | Friday) = enumeration NMTOKEN = must be a valid XML name NMTOKENS = multiple valid XML names ENTITY = you don’t want to know this

18 Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed

19 Using DTDs Must include in the XML document Either include the entire DTD: – Or include a reference to it: – Or mix the two... (e.g. to override the external definition)

20 DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> …

21 DTDs as Grammars A DTD = a grammar A valid XML document = a parse tree for that grammar

22 DTDs as Schemas Not so well suited: impose unwanted constraints on order references cannot be constrained can be too vague:

23 XML Schemas http://www.w3.org/TR/xmlschema-1/ 10/2000 generalizes DTDs uses XML syntax two documents: structure and datatypes –http://www.w3.org/TR/xmlschema-1 –http://www.w3.org/TR/xmlschema-2 XML-Schema is very complex –often criticized –some alternative proposals

24 XML Schemas DTD:

25 Elements v.s. Types in XML Schema DTD:

26 Types: –Simple types (integers, strings,...) –Complex types (regular expressions, like in DTDs) Element-type-element alternation: –Root element has a complex type –That type is a regular expression of elements –Those elements have their complex types... –... –On the leaves we have simple types

27 Local and Global Types in XML Schema Local type: [define locally the person’s type] Global type: [define here the type ttt] Global types: can be reused in other elements

28 Local v.s. Global Elements in XML Schema Local element:... Global element:... Global elements: like in DTDs

29 Regular Expressions in XML Schema Recall the element-type-element alternation: [regular expression on elements] Regular expressions: A B C = A B C A B C = A | B | C A B C = (A B C).. = (...)*.. = (...)?

30 Local Names in XML-Schema............................ name has different meanings in person and in product

31 Subtle Use of Local Names Arbitrary deep binary tree with A elements, and a single B element

32 Summary of XML Schema Formal Expressive Power: –Can express precisely the regular tree languages (over unranked trees) Lots of other stuff –Some form of inheritance –A “null” value –Large collection of data types

33 Summary of Schemas in SS data: –graph theoretic –data and schema are decoupled –used in data processing in XML –from grammar to object-oriented –schema wired with the data –emphasis on semantics for exchange


Download ppt "Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001."

Similar presentations


Ads by Google