Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.

Slides:



Advertisements
Similar presentations
Completeness and Expressiveness
Advertisements

Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Part B.
Relational Calculus   Logic, like whiskey, loses its beneficial effect when taken in too large quantities. --Lord Dunsany.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
Relational Algebra Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw Hill, 2003.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Read sections 7.1 – 7.3 of the book for next time.
1 Finite Model Theory Lecture 13 FO k, L k 1, ,L  1, , and Pebble Games.
Relational Calculus. Another Theoretical QL-Relational Calculus n Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus.
1 Lecture 5: Relational calculus
Relational Calculus CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth   We will occasionally use this arrow notation unless there is danger of.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Calculus R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion. Ronald Graham Elements of Ramsey.
Relational Calculus CS 186, Fall 2003, Lecture 6 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
Rutgers University Relational Calculus 198:541 Rutgers University.
Abstract State Machines and Computationally Complete Query Languages Andreas Blass,U Michigan Yuri Gurevich,Microsoft Research & U Michigan Jan Van den.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
Relational Calculus R&G, Chapter 4. Relational Calculus Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus.
Relational Calculus CS 186, Spring 2005, Lecture 9 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Finite Model Theory Lecture 3 Ehrenfeucht-Fraisse Games.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
1 Finite Model Theory Lecture 1: Overview and Background.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L5_Relational Calculus 1 Relational Calculus Chapter 4 – Part B.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
Relational Calculus Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
1 Finite Model Theory Lecture 12 Regular Expressions, FO k.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
CSE202 Database Management Systems
Formal Modeling Concepts
Relational Calculus Chapter 4, Part B
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
Alternating tree Automata and Parity games
Relational Calculus.
Finite Model Theory Lecture 6
Lecture 10: Query Complexity
Great Theoretical Ideas in Computer Science
CS 186, Fall 2002, Lecture 8 R&G, Chapter 4
CSE 544: Lecture 8 Theory.
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
Relational Algebra & Calculus
Relational Calculus Chapter 4, Part B 7/1/2019.
CSE544 Wednesday, March 29, 2006.
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Relational Calculus Chapter 4, Part B
Presentation transcript:

Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001

A History of DB Theory: In the Beginning... Up to 1970, a “database” was a file of records –COBOL/CODASYL –Network model, with low level navigational interface Codd proposed the relational model in 1970 –Database = a first order structure This was a great vision; it took 10 years for the community to adopt it Today: relational databases heralded as major success of theory

The Golden Years The 80s: rich research work on foundations Relational model and algebra: –Theory of functional dependecies –Transaction processing Study other data models: –Complex objects, object oriented Study other query languages: –Query complexity  descriptive complexity Study other applications: –Distributed query processing, semijoin reduction –Partial information

But practical database interested only in: –one particular language (SQL) –one particular application (OLTP queries) and one particular architecture (client-server) Transaction processing = useful Functional dependencies = somewhat The rest = great but useless

Database Theory in the Web Age Sudden interest in changing everything –Web data is not relational: what is it ? The XML-Schema has a few hundreds pages; how to understand it ? –New query languages are not relational algebra: what are they ? W3C is designing a new XML query language; how to proceed ? –New architectures that are not client-server: Distributed data, incomplete information, etc.

Our Goal Talk about fundamental concepts in the theory of the relational model and relational query languages: Use AHV’s book liberally

Given: –a vocabulary, R 1, …, R k –An arity, ar(R i ), for each i=1,…,k –an infinite supply of variables x 1, x 2, x 3, … FO formulas,, are: First Order Logic Sometimes we also allow constants

Examples of FO Formulas x is a free variable “a  b” abbreviates as usual “¬a V b” Bound and free variables defined the usual way

Models for FO Given a vocabulary R 1, …, R k A model is D = (D, R 1, …, R k ) –D = a set, called domain, or universe –R i  D x D x... x D, (ar(R i ) times) i = 1,...,k The model is finite if R 1,..., R k is finite E.g. D = int, while R 1,...,R k are finite sets

Remarks Vocabulary R 1, …, R k = database schema Model = database instance Abuse of notation: R i and R i Abuse of notation: D and D We are interested in finite models, but we will consider infinite models too, for a while

Meaning (Semantics) of FO formulas Given: –A formula, with free variables x 1,..., x n (we write ) –A model (D, R 1,..., R k ) We say that is true on a 1,..., a n  D: –In notation: D |=  a 1,..., a n ) –Defined inductively (next)

Meaning of FO formulas (similarly for OR and NOT)

FO Formulas as Queries Given: –A FO formula –A (finite) model D = (D, R 1,..., R k ) The answer of evaluating on D is: Hence: an FO formula defines a function mapping a database to a relation

Examples of Formulas = Queries D = Vocabulary: single relation R R= Graphs are the most “common” models

Examples of Formulas = Queries Notice: uses a constant, 1 Looks for successors of 1 Answer: q 1 (D) = {2, 4} Looks for pairs (x,y) connected by paths of length 2 Answer: q 2 (D) = {(1,1), (2,2), (1,3), (2,4)} Answer: q 3 (D)={1}

Boolean Queries A boolean query is one without free variables Its answer is true or false Tests for a clique

More Examples Vocabulary (= schema): –Employee(name, office, mgr), Manager(name, office) Queries: –Find offices: –Find offices with at least two employees: –Find managers that share office with all their employees:

Properties of Queries Decidable Generic Domain-independent They make more sense if we think of queries in general, not just FO queries Define next general queries

Queries A query, q, is a function from models to relations, s.t. for every model (D, R 1,..., R k ): –q(D, R 1,..., R k ) = R, s.t. R  D n Here n is called the arity of q; when n=0, q is called a boolean query

Property 1: Decidable Queries q is decidable if there exists a Turing Machine that, for some encoding of D, given R 1,..., R k on its input tape, computes q(D, R 1,..., R k )

Property 2: Domain Independence In English –q only depends on R 1,..., R k, not on D ! –Intuition: a database consists only of R 1,..., R k, not on D. Formally: a query q is domain independent if –for any model (D, R 1,..., R k ) –for any set D’ s.t. R 1  (D’) ar(R1),..., R k  (D’) ar(Rk) –the following holds q(D, R 1,..., R k ) = q(D’, R 1,..., R k )

Property 2: Domain Independence Examples: Queries that are domain independent: –“Find pairs of nodes connected by a path of length 2” –“Find the manager of Smith” –“Find the largest salary in the database” Queries that are not domain independent: –“Find all nodes that are not in the graph” –“Find the average salary”

Property 3: Genericity In English: –q does not depend on the particular encoding of the database Formally: –for every h:(D,R 1,...,R k )  (D’,R’ 1,...,R’ k ) –s.t. h=injective, h(D) = D’, h(R 1 )=R’ 1,..., h(R k )=R’ k –It follows: h(q(D,R 1,...,R k )) = q(D’,R’ 1,...,R’ k )

Property 3: Genericity Example: D = D’= q(D)={1,3} q(D’)= ??

Property 3: Genericity Examples: Queries that are generic: –“Find pairs of nodes connected by a path of length 2” –“Find all employees having the same office as their manager” –“Find all nodes that are not in the graph” Queries that are not generic: –“Find the manager of Smith” we often relax the definition to allow this to be generic C-genericity, for a set of constants C –“Find the largest salary in the database”

Property 3: Genericity More example: D = q(D)={4} This query cannot be generic (why ?)

Back to FO Queries 1.All FO queries are computable 2.NOT All FO queries are domain independent –Why ? Next... 3.All FO queries are generic –In particular query on previous slide not expressible in FO

FO Queries and Domain Independence Find all nodes that are not in the graph: Find all nodes that are connected to “everything”: Find all pairs of employees or offices: We don’t want such queries !

FO Queries and Domain Independence Domain independent FO queries are also called safe queries Definition. The active domain of (D, R 1,..., R k ) is D a = the set of all constants in R 1,..., R k E.g. for graphs, D a = Very important: –If a query is safe, it suffices to range quantifiers only over the active domain (why ?)

FO Queries and Domain Independence The bad news: –Theorem It is undecidable if a given a FO query is safe. The good news: –no big deal –can define a subset of FO queries that we know are safe = range restricted queries (rr-query) –Any safe query is equivalent to some rr-query

Range-restriction Syntactic, rather ad-hoc definition (several exists): OK, not OK If a query q is safe, it is equivalent to a rr-query:

FO = Relational Algebra Recall the 5 operators in the relational algebra: –U, -, x, ,  Theorem. A domain independent query is expressible in FO iff it is expressible in the relational algebra