Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Slides:



Advertisements
Similar presentations
XDuce Tabuchi Naoshi, M1, Yonelab.
Advertisements

Module 3 XML Schema.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
4 XML Schema.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
XML 6.5 XML Schema (XSD) 6. What is XML Schema? The origin of schema  XML Schema documents are used to define and validate the content and structure.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.
CS21 Decidability and Tractability
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
CS5371 Theory of Computation
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
XML Schema Definition Language
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
XML Schema Matthias Hauswirth. Agenda 4 W3C Process 4 XML Schema Requirements 4 The Specifications 4 Schema Tools.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
Managing XML and Semistructured Data
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Normal forms for Context-Free Grammars
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML and friends Part 2 - XML Schema ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation.
Creating Data Schemas Presentation by Chad Borer 2/6/2006.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 2 Lecturer.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
Schemas 1www.tech.findforinfo.com. What is a Schema a schematic or preliminary plan Description of a structure, details... 2www.tech.findforinfo.com.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
CSE 6331 © Leonidas Fegaras XML Schema 1 XML Schema Leonidas Fegaras.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Finite Model Theory Lecture 12 Regular Expressions, FO k.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Tree Automata First: A reminder on Automata on words Typing semistructured data.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Formal Languages, Automata and Models of Computation
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz
PARSE TREES.
THE DATATYPES OF XML SCHEMA A Practical Introduction
Intro to Data Structures
4b Lexical analysis Finite Automata
Chapter Five: Nondeterministic Finite Automata
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Lecture 5 Scanning.
Presentation transcript:

Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001

Outline XML Schema Types in Xduce Regular tree languages

Attributes in XML Schema Attributes are associated to the type, not to the element Only to complex types; more trouble if we want to add attributes to simple types.

“Mixed” Content, “Any” Type Better than in DTDs: can still enforce the type, but now may have text between any elements Means anything is permitted there....

“All” Group A restricted form of & in SGML Restrictions: –Only at top level –Has only elements –Each element occurs at most once E.g. “comment” occurs 0 or 1 times

Derived Types by Extensions Corresponds to inheritance

Derived Types by Restrictions (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions… … [rewrite the entire content, with restrictions]... Corresponds to set inclusion

Simple Types String Token Byte unsignedByte Integer positiveInteger Int (larger than integer) unsignedInt Long Short... Time dateTime Duration Date ID IDREF IDREFS

Facets of Simple Types Examples length minLength maxLength pattern enumeration whiteSpace maxInclusive maxExclusive minInclusive minExclusive totalDigits fractionDigits Facets = additional properties restricting a simple type 15 facets defined by XML Schema

Facets of Simple Types Can further restrict a simple type by changing some facets Restriction = subset

Not so Simple Types List types: Union types Restriction types

Types in XDuce Xduce = a functional programming language (like ML) Emphasis: type checking for its functions Data model = ordered trees –Captures XML elements and attributes Types = regular expressions –Same expressive power as XML Schema –Simpler concept –Closer connection to regular tree languages

Values in XDuce ML for the Working Programmer Paulson ML for the Working Programmer Paulson val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....],... ] val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....],... ]

Types in XDuce type Bib = bib[(Book|Paper)*] type Book = book[Title, Author*, Year, Publisher?] type Title = title[String]... type Bib = bib[(Book|Paper)*] type Book = book[Title, Author*, Year, Publisher?] type Title = title[String]...

Types in XDuce Important idea: –Types are first class citizens –Element names are second class This is consistent with regular expressions and automata: –Type = state (we will see later)

Example of Types in XDuce type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0] type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0]

Formal Definition of Types in XDuce T ::= variable ::= base type ::= () /* empty sequence */ ::= T,T /* concatenation */ ::= T | T /* alternation */ Where are “*” and “?” ?

Types in XDuce Derived types: Given T, the type T* is an abbreviation for: –type X = T, X | () Similarly, T+ and T? are abbreviations for: –type X = T, T* –type Y = T | ()

Types in XDuce Danger with recursion: –Type X = a[], X, b[] | () –What is is ? Need to restrict to tail recursive types

Subsumption in Xduce Types Definition. T1 <: T2 if the set defined by T1 is a subset of that defined by T2 Examples –Name, Addr <: Name, Addr, Tel? –Name, Addr, Tel <: Name, Addr, Tel? –T, T, T <: T*

XDuce Main goal: given a function, check that it is type correct –Come to Benjamin Pierce’s talk on Monday One note: –The type checking algorithm in Xduce incomplete (will see why, in a couple of lectures) Important piece of typechecking: –Checking if T1 <: T2 Obviously can’t do this for context free languages But can do for regular languages (next)

Regular Tree Languages Given a ranked alphabet, L = L 0  L 1 ...  L k Ranked trees are T ::= a[T 1,...,T i ] a  L i Definition Bottom-up tree automata is A = (L, Q, , Q F ) where: –L = ranked alphabet –Q = set of states –  = transition relation,  : (  i=0,k L x Q i )  Q –Q F = terminal states

Bottom Up Tree Authomata Computation on a tree t For each node t = a[t 1,...,t i ], if the roots of t 1,..., t i are labeled with states q 1,..., q i and q in  (a, q 1,..., q i ), then label t with q If the root is labeled with a state in Q F, then accept The language accepted by A consists of all trees t accepted by A A regular tree language is a set of trees accepted by some automaton A

Example of Tree Automaton L 0 = {b}, L 2 = {a} Q = {q 1, q 2 }  (b) = q 1,  (a,q 1,q 1 ) = q 2,  (a,q 2,q 2 ) = q 1 What does this accept ?

Properties of Regular Tree Languages If T1, T2 are regular, then so are: –T1  T2 –T1 – T2 –T1  T2 If A is a nondeterministic bottom up tree automaton, then there exists an equivalent deterministic one –Not true for “top-down” automata If T1, T2 are regular, then it is decidable whether T1  T2

Top-down Automata Defined similarly, just the computation differs: –Start from the root at an initial state, move downwards –If all leaves end in an accepting state, then accept Here deterministic automata are strictly weaker –e.g. cannot recognize the set {a[a,b], a[b,a]} Nondeterministic bottom up = = deterministic bottom up = nondeterministic top down

Example of a Bottom-up Automaton A = (L, Q,, , q 0, Q F ) where –L = L 0  L 2, L 0 = {a, b}, L 2 = {a} –Q = {T0, T1} –  (a) = T0,  (b) = T1, –  (a, T1, T0) = T1,  (a, T0, T1) = T1 type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0] type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0]

Regular Tree Languages and XDuce types For ranked alphabets, tail-recursive Xduce types correspond precisely to regular tree languages Same is true for unranked alphabets, but there the definition of regular tree lnaugages is more complex

Conclusion for Schemas A Theoretical View XML Schemas = Xduce types = regular tree languages DTDs = strictly weaker A Practical View XML Schemas still too complex