Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.

Slides:



Advertisements
Similar presentations
Language and Automata Theory
Advertisements

XDuce Tabuchi Naoshi, M1, Yonelab.
Chapter 5 Pushdown Automata
Lecture 23UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 23.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture5: Context Free Languages Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
CS5371 Theory of Computation
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2009 with acknowledgement.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Normal forms for Context-Free Grammars
Managing XML and Semistructured Data
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 8 Mälardalen University 2010.
XML Typing and Query Evaluation. Plan We will put some formal model underlying XML Trees and queries on them – Keeping in mind the practical aspects but.
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 XDuce XDuce: A statically Type XML Processing: Hosoya and Pierce Presented by: Guy Korland Based on presentation by: Tabuchi
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2010 with acknowledgement.
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,
1 Finite Model Theory Lecture 12 Regular Expressions, FO k.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
Tree Automata First: A reminder on Automata on words Typing semistructured data.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Theory of Computation. Introduction to The Course Lectures: Room ( Sun. & Tue.: 8 am – 9:30 am) Instructor: Dr. Ayman Srour (Ph.D. in Computer Science).
Theory of Computation Automata Theory Dr. Ayman Srour.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Fall 2004COMP 3351 Finite Automata. Fall 2004COMP 3352 Finite Automaton Input String Output String Finite Automaton.
Formal Languages, Automata and Models of Computation
CS 404 Introduction to Compiler Design
Lexical analysis Finite Automata
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz
Chapter 7 PUSHDOWN AUTOMATA.
PARSE TREES.
COSC 3340: Introduction to Theory of Computation
Intro to Data Structures
Introduction to Finite Automata
Finite Automata.
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
… NPDAs continued.
Chapter 1 Regular Language
Pushdown automata The Chinese University of Hong Kong Fall 2011
Presentation transcript:

Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001

In this lecture Introduction to XDuce –types in XDuce –subsumption and typechecking in XDuce Regular tree languages –tree automata Connection between regular languages and XDuce types Resources XDuce: A typed XML processing languageXDuce: A typed XML processing language by Hosoya and Pierce

Types in XDuce Xduce = a functional programming language (like ML) Emphasis: type checking for its functions Data model = ordered trees –Captures XML elements and attributes Types = regular expressions –Same expressive power as XML Schema –Simpler concept –Closer connection to regular tree languages

Values in XDuce ML for the Working Programmer Paulson ML for the Working Programmer Paulson val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....],... ] val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....],... ]

Types in XDuce type Bib = bib[(Book|Paper)*] type Book = book[Title, Author*, Year, Publisher?] type Title = title[String]... type Bib = bib[(Book|Paper)*] type Book = book[Title, Author*, Year, Publisher?] type Title = title[String]...

Types in XDuce Important idea: –Types are first class citizens –Element names are second class This is consistent with regular expressions and automata: –Type = state (we will see later)

Example of Types in XDuce type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0] type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0]

Formal Definition of Types in XDuce T ::= variable ::= base type ::= () /* empty sequence */ ::= T,T /* concatenation */ ::= T | T /* alternation */ Where are “*” and “?” ?

Types in XDuce Derived types: Given T, the type T* is an abbreviation for: –type X = T, X | () Similarly, T+ and T? are abbreviations for: –type X = T, T* –type Y = T | ()

Types in XDuce Danger with recursion: –Type X = a[], X, b[] | () –What is is ? Need to restrict to tail recursive types

Subsumption in Xduce Types Definition. T1 <: T2 if the set defined by T1 is a subset of that defined by T2 Examples –Name, Addr <: Name, Addr, Tel? –Name, Addr, Tel <: Name, Addr, Tel? –T, T, T <: T*

XDuce Main goal: given a function, check that it is type correct –Come to Benjamin Pierce’s talk on Monday One note: –The type checking algorithm in Xduce incomplete (will see why, in a couple of lectures) Important piece of typechecking: –Checking if T1 <: T2 Obviously can’t do this for context free languages But can do for regular languages (next)

Regular Tree Languages Given a ranked alphabet, L = L 0  L 1 ...  L k Ranked trees are T ::= a[T 1,...,T i ] a  L i Definition Bottom-up tree automaton is A = (L, Q, , Q F ) where: –L = ranked alphabet –Q = set of states –  = transition relation,  : (  i=0,k L i x Q i )  Q –Q F = terminal states

Bottom Up Tree Authomata Computation on a tree t For each node t = a[t 1,...,t i ], if the roots of t 1,..., t i are labeled with states q 1,..., q i and q in  (a, q 1,..., q i ), then label t with q If the root is labeled with a state in Q F, then accept The language accepted by A consists of all trees t accepted by A A regular tree language is a set of trees accepted by some automaton A

Example of Tree Automaton L 0 = {b}, L 2 = {a} Q = {q 1, q 2 }  (b) = q 1,  (a,q 1,q 1 ) = q 2,  (a,q 2,q 2 ) = q 1 Q final = q 1 What does this accept ? trees such that each leaf is at even height

Properties of Regular Tree Languages If T1, T2 are regular, then so are: –T1  T2 –T1 – T2 –T1  T2 If A is a nondeterministic bottom up tree automaton, then there exists an equivalent deterministic one –Not true for “top-down” automata If T1, T2 are regular, then it is decidable whether T1  T2

Top-down Automata Defined similarly, just the computation differs: –Start from the root at an initial state, move downwards –If all leaves end in an accepting state, then accept Here deterministic automata are strictly weaker –e.g. cannot recognize the set {a[a,b], a[b,a]} Nondeterministic bottom up = = deterministic bottom up = nondeterministic top down

Example of a Bottom-up Automaton A = (L, Q,, , q 0, Q F ) where –L = L 0  L 2, L 0 = {a, b}, L 2 = {a} –Q = {T0, T1} –  (a) = T0,  (b) = T1, –  (a, T1, T0) = T1,  (a, T0, T1) = T1 type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0] type T1 = b[] | a[T1, T0] | a[T0, T1] type T0 = a[] | a[T0, T0]

Regular Tree Languages and XDuce types For ranked alphabets, tail-recursive Xduce types correspond precisely to regular tree languages Same is true for unranked alphabets, but there the definition of regular tree lnaugages is more complex

Conclusion for Schemas A Theoretical View XML Schemas = Xduce types = regular tree languages DTDs = strictly weaker A Practical View XML Schemas still too complex