Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.

Slides:



Advertisements
Similar presentations
17 Apr 2002 XML Syntax: DTDs Andy Clark. Validation of XML Documents XML documents must be well-formed XML documents may be valid – Validation verifies.
Advertisements

Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
More XML namespaces, DTDs CS 431 – February 16, 2005 Carl Lagoze – Cornell University.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Managing XML and Semistructured Data
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
Dr. Azeddine Chikh IS446: Internet Software Development.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
XML Structures For Existing Databases Ref: 106.ibm.com/developerworks/xml/library/x-struct/
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Cornell CS 502 More XML XHTML, namespaces, DTDs CS 502 – Carl Lagoze – Cornell University.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
An OO schema language for XML SOX W3C Note 30 July 1999.
More XML namespaces, DTDs CS 431 – Carl Lagoze – Cornell University.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
1/11 ITApplications XML Module Session 3: Document Type Definition (DTD) Part 1.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML – A Quick Introduction Kerry Raymond (stolen from others)
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Copyrighted material John Tullis 3/18/2016 page 1 04/29/00 XML Part 4 John Tullis DePaul Instructor
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Management of XML and Semistructured Data
Managing XML and Semistructured Data
Web Programming Maymester 2004
Managing XML and Semistructured Data
Lecture 9: XML Monday, October 17, 2005.
DTD (Document Type Definition)
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 11: XML and Semistructured Data
Presentation transcript:

Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001

Overview Schema Extraction for SS data Schemas for XML –DTDs –XML Schema

Review of Schemas so far Upper bound schema S Tell us what labels are allowed Conformance test: D  S In practice: need deterministic schemas Lower bound schema S Tells us what labels are required Conformance test: S  D Alternative formulation: datalog programs, maximal fixpoint

Schema Extraction (From Data) Problem statement given data instance D find the “most specific” schema S for D In practice: S too large, need to relax [Nestorov, Abiteboul, Motwani 1998]

Schema Extraction: Sample Data &r &p8&p1&p2&p3&p4&p5&p6&p7 &c company employee worksfor manages managedby manages managedby Example database D =

Lower Bound Schema Extraction [NAM’98] approach: Start with the schema given by the data (S = D): –Each node = a predicate = a class Compute maximal fixpoint (PTIME) Declare two classes equal iff they are equal sets –E.g. p4={&p1,&p4,&p6}, p6={&p1,&p4,&p6}, hence p1=p4 Equivalently, p=p’ iff p(&p’) and p’(&p)... p4(x) :- link(x, manages, y), p5(y), link(x, worksfor, z), c(z) p5(x) :- link(x, managed-by, y), p4(y), link(x, worksfor, z), c(z)... p4(x) :- link(x, manages, y), p5(y), link(x, worksfor, z), c(z) p5(x) :- link(x, managed-by, y), p4(y), link(x, worksfor, z), c(z)...

Lower Bound Schema Extraction Root &r Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby worksfor employee Result S =

Lower Bound Schema Extraction Equivalently: Compute the maximal simulation D  D –Can do in time O(m 2 ) Two nodes p, p’ are equivalent iff x  x’ and x’  x Schema consists of equivalence classes Remark: could use the bisimulation relation instead (perhaps is even better)

Upper Bound Schema Extraction The extracted lower bound schema S is also an upper bound schema ! But: nondeterministic Convert S  S d Alternatively, convert directly D  D d = S d –These are data guides [McHugh and Widom]

Upper Bound Schema Extraction Root &r Employees &p1,&p1,&p3,P4 &p5,&p6,&p7,&p8 Bosses &p1,&p4,&p6 Regulars &p2,&p3,&p5,&p7,&p8 Company &c company employee manages managedby manages managedby worksfor Result S d =

XML Document Type Definitions part of the original XML specification an XML document may have a DTD terminology for XML: –well-formed: if tags are correctly closed –valid: if it has a DTD and conforms to it validation is useful in data exchange

Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

Very Simple DTD John B Jim B John B Jim B Example of valid XML document:

Content Model Element content: what we can put in an element (aka content model) Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* (i.e. very restrictied)

Attributes in DTDs

Attributes in DTDs <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>

Attributes in DTDs Types: CDATA = string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday | Wednesday | Friday) = enumeration NMTOKEN = must be a valid XML name NMTOKENS = multiple valid XML names ENTITY = you don’t want to know this

Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed

Using DTDs Must include in the XML document Either include the entire DTD: – Or include a reference to it: – Or mix the two... (e.g. to override the external definition)

DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> …

DTDs as Grammars A DTD = a grammar A valid XML document = a parse tree for that grammar

DTDs as Schemas Not so well suited: impose unwanted constraints on order references cannot be constrained can be too vague:

XML Schemas 10/2000 generalizes DTDs uses XML syntax two documents: structure and datatypes – – XML-Schema is very complex –often criticized –some alternative proposals

XML Schemas DTD:

Elements v.s. Types in XML Schema DTD:

Types: –Simple types (integers, strings,...) –Complex types (regular expressions, like in DTDs) Element-type-element alternation: –Root element has a complex type –That type is a regular expression of elements –Those elements have their complex types... –... –On the leaves we have simple types

Local and Global Types in XML Schema Local type: [define locally the person’s type] Global type: [define here the type ttt] Global types: can be reused in other elements

Local v.s. Global Elements in XML Schema Local element:... Global element:... Global elements: like in DTDs

Regular Expressions in XML Schema Recall the element-type-element alternation: [regular expression on elements] Regular expressions: A B C = A B C A B C = A | B | C A B C = (A B C).. = (...)*.. = (...)?

Local Names in XML-Schema name has different meanings in person and in product

Subtle Use of Local Names Arbitrary deep binary tree with A elements, and a single B element

Summary of XML Schema Formal Expressive Power: –Can express precisely the regular tree languages (over unranked trees) Lots of other stuff –Some form of inheritance –A “null” value –Large collection of data types

Summary of Schemas in SS data: –graph theoretic –data and schema are decoupled –used in data processing in XML –from grammar to object-oriented –schema wired with the data –emphasis on semantics for exchange