1 CMPS 277 – Principles of Database Systems https://courses.soe.ucsc.edu/courses/cmps277/Fall11/01 Lecture #8.

Slides:



Advertisements
Similar presentations
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
Advertisements

NP-Hard Nattee Niparnan.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
Complexity class NP Is the class of languages that can be verified by a polynomial-time algorithm. L = { x in {0,1}* | there exists a certificate y with.
The Theory of NP-Completeness
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Computability and Complexity 14-1 Computability and Complexity Andrei Bulatov Cook’s Theorem.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
NP-Complete Problems Problems in Computer Science are classified into
Analysis of Algorithms CS 477/677
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Toward NP-Completeness: Introduction Almost all the algorithms we studies so far were bounded by some polynomial in the size of the input, so we call them.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Abstract State Machines and Computationally Complete Query Languages Andreas Blass,U Michigan Yuri Gurevich,Microsoft Research & U Michigan Jan Van den.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
The Relational Model: Relational Calculus
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Theory of Computation, Feodor F. Dragan, Kent State University 1 NP-Completeness P: is the set of decision problems (or languages) that are solvable in.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
NP Complexity By Mussie Araya. What is NP Complexity? Formal Definition: NP is the set of decision problems solvable in polynomial time by a non- deterministic.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
CS344: Introduction to Artificial Intelligence Lecture: Herbrand’s Theorem Proving satisfiability of logic formulae using semantic trees (from Symbolic.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28– Interpretation; Herbrand Interpertation 30 th Sept, 2010.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
1 Design and Analysis of Algorithms Yoram Moses Lecture 11 June 3, 2010
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
NP-completeness Section 7.4 Giorgi Japaridze Theory of Computability.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
CSC 413/513: Intro to Algorithms
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
CHAPTER 2 Boolean algebra and Logic gates
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Relational Algebra & Calculus
The Theory of NP-Completeness
Chapter 10 NP-Complete Problems.
Introduction to Logic for Artificial Intelligence Lecture 2
L is in NP means: There is a language L’ in P and a polynomial p so that L1 ≤ L2 means: For some polynomial time computable map r :  x: x  L1 iff.
Advanced Algorithms Analysis and Design
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
ICS 353: Design and Analysis of Algorithms
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Chapter 11 Limitations of Algorithm Power
Finite Model Theory Lecture 6
Lecture 10: Query Complexity
Finite Model Theory Lecture 7
Relational Algebra & Calculus
CHAPTER 7 Time complexity
Presentation transcript:

1 CMPS 277 – Principles of Database Systems Lecture #8

2 Summary The Query Evaluation Problem for Relational Calculus is PSPACE- complete. The Query Equivalence Problem for Relational Calculus in undecidable. The Query Containment Problem for Relational Calculus is undecidable. Similar results holds for the Relational Algebra version of these problems.

3 The Query Evaluation Problem Revisited Theorem: Let  be a fixed relational calculus formula. Then the following problem is solvable in polynomial time: given a database instance I, find  adom (I). In fact, this problem is in LOGSPACE. Proof: Let  be a fixed relational calculus formula 8 x 1 9 x 2 … 8 x m à  Exhaustive search algorithm has running time O(|I| |  | ), which is a polynomial, since |  | is a constant.  Moreover, the algorithm can now be implemented using logarithmic-space only, since we need only maintain a constant number of memory blocks, each of logarithmic size 8 x 1 9 x 2 … 8 x m a 1 in adom(I) written in binary a 2 in adom(I) written in binary …a m in adom(I) written in binary

4 Vardi’s Taxonomy of the Query Evaluation Problem M.Y Vardi, “The Complexity of Relational Query Languages”, 1982 Definition: Let L be a database query language.  The combined complexity of L is the decision problem: given an L-sentence and a database instance I, is  true on I? (does I satisfy  (in symbols, does I ²   The data complexity of L is the family of the following decision problems Q , where  is an L-sentence: given a database instance I, does I ²  ?  The query complexity of L is the family of the following decision problems Q I, where I is a database instance: given an L-sentence , does I ²  ?

5 Vardi’s Taxonomy of the Query Evaluation Problem Definition: Let L be a database query language and let C be a computational complexity class.  The data complexity of L is in C if for each L-sentence  the decision problem Q  is in C.  The query complexity of L is in C if for every database instance, the decision problem Q I is in C. Vardi’s discovery: For most query languages L: The data complexity of L is of lower complexity than both the combined complexity of L and the query complexity of L The query complexity of L can be as hard as the combined complexity of L.

6 Taxonomy of the Query Evaluation Problem for Relational Calculus Computational Complexity Classes ProblemComplexity Combined Complexity PSPACE-complete Query Complexity  Is in PSPACE  It can be PSPACE-complete Data ComplexityIn LOGSPACE LOGSPACE NLOGSPACE P NP PSPACE The Query Evaluation Problem for Relational Calculus

7 Sublanguages of Relational Calculus Question: Are there interesting sublanguages of relational calculus for which the Query Containment Problem and the Query Evaluation Problem are “easier” than the full relational calculus? Answer:  Yes, the language of conjunctive queries is such a sublanguage.  Moreover, conjunctive queries are the most frequently asked queries against relational databases.

8 Conjunctive Queries Definition: A conjunctive query is a query expressible by a relational calculus formula in prenex normal form built from atomic formulas R(y 1,…,y n ), and Æ and 9 only. { (x 1,…,x k ): 9 z 1 … 9 z m  (x 1, …,x k, z 1,…,z k ) }, where  (x 1, …,x k, z 1,…,z k ) is a conjunction of atomic formulas of the form R(y 1,…,y m ).  Equivalently, a conjunctive query is a query expressible by a relational algebra expression of the form ¼ X ( ¾ £ (R 1 £ … £ R n )), where £ is a conjunction of equality atomic formulas (equijoin).  Equivalently, a conjunctive query is a query expressible by an SQL expression of the form SELECT FROM WHERE

9 Conjunctive Queries Definition: A conjunctive query is a query expressible by a relational calculus formula in prenex normal form built from atomic formulas R(y 1,…,y n ), and Æ and 9 only. { (x 1,…,x k ): 9 z 1 … 9 z m  (x 1, …,x k, z 1,…,z k ) }  A conjunctive query can be written as a logic-programming rule: Q(x 1,…,x k ) :-- R 1 (u 1 ), …, R n (u n ), where  Each variable x i occurs in the right-hand side of the rule.  Each u i is a tuple of variables (not necessarily distinct)  The variables occurring in the right-hand side (the body), but not in the left-hand side (the head) of the rule are existentially quantified (but the quantifiers are not displayed).  “,” stands for conjunction.

10 Conjunctive Queries Examples:  Path of Length 2: (Binary query) { (x,y): 9 z (E(x,z) Æ E(z,y)) } As a relational algebra expression, ¼ 1,4 ( ¾ $2 = $3 (E £ E)) As a rule: q(x,y) :-- E(x,z), E(z,y)  Cycle of Length 3: (Boolean query) 9 x 9 y 9 z(E(x,y) Æ E(y,z) Æ E(z,x)) As a rule (the head has no variables)  Q :-- E(x,z), E(z,y), E(z,x)

11 Conjunctive Queries Every relational join is a conjunctive query: P(A,B,C), R(B,C,D) two relation symbols  P ⋈ R = { (x,y,z,w): P(x,y,z) Æ R(y,z,w) }  q(x,y,z,w) :-- P(x,y,z), R(y,z,w) (no variables are existentially quantified)  SELECT P.A, P.B, P.C, R.D FROM P, R WHERE P.B = R.B AND P.C = R.C  Conjunctive queries are also known as SPJ-queries (SELECT-PROJECT-JOIN queries)

12 Conjunctive Query Evaluation and Containment Definition: Two fundamental problems about CQs  Conjunctive Query Evaluation (CQE): Given a conjunctive query q and an instance I, find q(I).  Conjunctive Query Containment (CQC): Given two k-ary conjunctive queries q 1 and q 2, is it true that q 1 µ q 2 ? (i.e., for every instance I, we have that q 1 (I) µ q 2 (I)) Given two Boolean conjunctive queries q 1 and q 2, is it true that q 1 ² q 2 ? (that is, for all I, if I ² q 1, then I ² q 2 )? CQC is logical implication.

13 CQE vs. CQC  Recall that for relational calculus queries:  The Query Evaluation Problem is PSPACE-complete (combined complexity).  The Query Containment Problem is undecidable.  Theorem: Chandra & Merlin, 1977  CQE and CQC are the “same” problem.  Moreover, each is an NP-complete problem.  Question: What is the common link?  Answer: The Homomorphism Problem

14 Isomorphisms Between Database Instances Definition: Let I and J be two database instances over the same relational schema S.  An isomorphism h: I ! J is a function h: adom(I) ! adom(J) such that h is one-to-one and onto. For every relational symbol P of S and every (a 1,…,a m ), we have that (a 1,…,a m ) 2 P I if and only if (h(a 1 ),.., h(a m )) 2 P J.  I and J are isomorphic if an isomorphism h from I to J exists. Note: Intuitively, two database instances are isomorphic if one can be obtained from the other by renaming the elements of its active domain in a 1-1 way.

15 A Digression to The Isomorphism Problem The Isomorphism Problem: Given two database instances I and J over the same relational schema, is I isomorphic to J? Fact: The exact computational complexity of the isomorphism problem is not known at present.  The isomorphism problem is in NP (this is easy).  The isomorphism problem is not known to be NP-complete (and there is evidence that it is not NP-complete).  The isomorphism problem is not known to be in P. Finding a polynomial-time algorithm for the isomorphism problem would be a major breakthrough.  Special cases of the isomorphism problem are known to be in P (for example, the isomorphism problem on planar graphs).

16 Homomorphisms Definition: Let I and J be two database instances over the same relational schema S. A homomorphism h: I ! J is a function h: adom(I) ! adom(J) such That for every relational symbol P of S and every (a 1,…,a m ), we have that if (a 1,…,a m ) 2 P I, then (h(a 1 ),.., h(a m ) 2 P J.  Note: The concept of homomorphism is a relaxation of the concept of isomorphism, since every isomorphism is also a homomorphism, but not vice versa.  Example:  A graph G = (V,E) is 3-colorable if and only if there is a homomorphism h: G ! K 3

17 Homomorphisms Fact: Homomorphisms compose, i.e., if f: I ! J and g: J ! K are homomorphisms, then g ± f: I ! K is a homomorphims, where g ± f(a) = g(f(a)). Definition:  Two database instances I and I’ are homomorphically equivalent if there is a homomorphism h: I ! I’ and a homomorphism h’: I’ ! I.  I ´ h I’ means that I and I’ are homomorphically equivalent. Note: I ´ h I’ does not imply that I and I’ are isomorphic.

18 Homomorphisms Fact: Homomorphisms compose, i.e., if f: I ! J and g: J ! K are homomorphisms, then g ± f: I ! K is a homomorphims, where g ± f(a) = g(f(a)). Definition:  Two database instances I and I’ are homomorphically equivalent if there is a homomorphism h: I ! I’ and a homomorphism h’: I’ ! I.  I ´ h I’ means that I and I’ are homomorphically equivalent. Note: I ´ h I’ does not imply that I and I’ are isomorphic. I I’

19 The Homomorphism Problem  Definition: The Homomorphism Problem Given two database instances I and J, is there a homomorphism h: I ! J?  Notation: I ! J denotes that a homomorphism from I to J exists.  Theorem: The Homomorphism Problem is NP-complete Proof: Easy reduction from 3-Colorabilty G is 3-colorable if and only if G ! K 3.  Exercise: Formulate 3SAT as a special case of the Homomorphism Problem.

20 The Homomorphism Problem Note: The Homomorphism Problem is a fundamental algorithmic problem:  Satisfiability can be viewed as a special case of it.  k-Colorability can be viewed as a special case of it.  Many AI problems, such as planning, can be viewed as a special case of it.  In fact, every constraint satisfaction problem can be viewed as a special case of the Homomorphism Problem (Feder and Vardi – 1993).

21 The Homomorphism Problem and Conjunctive Queries  Theorem: Chandra & Merlin, 1977  CQE and CQC are the “same” problem.  Moreover, each is an NP-complete problem.  Question: What is the common link?  Answer:  Both CQE and CQC are “equivalent” to the Homomorphism Problem.  The link is established by bringing into the picture  Canonical conjunctive queries and  Canonical database instances.

22 Canonical CQs and Canonical Instances Definition: Canonical Conjunctive Query Given an instance I = (R 1, …,R m ), the canonical CQ of I is the Boolean conjunctive query Q I with (a renaming of) the elements of I as variables and the facts of I as conjuncts, where a fact of I is an expression R i (a 1,…,a m ) such that (a 1,…,a m ) 2 R i. Example: I consists of E(a,b), E(b,c), E(c,a)  Q I is given by the rule: Q I :-- E(x,z), E(z,y), E(y,x)  Alternatively, Q I is 9 x 9 y 9 z (E(x,z) Æ E(z,y) Æ E(y,x))

23 Canonical Conjunctive Query Example: K 3, the complete graph with 3 nodes K 3 is a database instance with one binary relation E, where E = { (b,r), (r,b), (b,g), (g,b), (r,g), (g,r) } The canonical conjunctive query Q K 3 of K 3 is given by the rule: Q K 3 :- E(x,y),E(y,x),E(x,z),E(z,x),E(y,z),E(z,y) The canonical conjunctive query Q K 3 of K 3 is also given by the relational calculus expression: 9 x,y,z(E(x,y) Æ E(y,x) Æ E(x,z) Æ E(z,x) Æ E(y,z) Æ E(z,y))

24 Canonical Database Instance Definition: Canonical Instance Given a CQ Q, the canonical instance of Q is the instance I Q with the variables of Q as elements and the conjuncts of Q as facts.  Example: Conjunctive query Q :-- E(x,y),E(y,z),E(z,w)  Canonical instance I Q consists of the facts E(x,y), E(y,z),E(z,w).  In other words, E I Q = { (x,y), (y,z), (z,w) }.

25 Canonical Database Instance  Example: Conjunctive query Q :-- E(x,z),E(z,y),P(z) or, equivalently, 9 x 9 y 9 z (E(x,z) Æ E(z,y) Æ P(z))  Canonical instance I Q consists of the facts E(x,z), E(z,y), P(z).  In other words, E I Q = { (x,z), (z,y) } and P I Q = { z }

26 Canonical Conjunctive Queries and Canonical Instances Fact:  For every database instance I, we have that I ² Q I.  For every Boolean conjunctive query Q, we have that I Q ² Q. Fact: Let I be a database instance, let Q I be its canonical conjunctive query and let I Q I be the canonical instance of Q I. Then I is isomorphic to I Q I.

27 Canonical Conjunctive Queries and Canonical Instances Magic Lemma: Assume that Q is a Boolean conjunctive query and J is a database instance. Then the following statements are equivalent.  J ² Q.  There is a homomorphism h: I Q ! J. Proof: Let Q be 9 x 1 … 9 x m  (x 1,…,x m ). 1. ) 2. Assume that J ² Q. Hence, there are elements a 1, …, a m in adom(J) such that J ²  (a 1,…,a m ). The function h with h(x i ) = a i, for i=1,…,m, is a homomorphism from I Q to J. 2. ) 1. Assume that there is a homomorphism h: I Q ! J. Then the values h(x i ) = a i, for i = 1,…, m, give values for the interpretation of the existential quantifiers 9 x i of Q in adom(J) so that J ²  (a 1,…,a m ).

28 Homomorphisms, CQE, and CQC The Homomorphism Theorem: Chandra & Merlin – 1977 For Boolean CQs Q and Q’, the following are equivalent:  Q µ Q’  There is a homomorphism h: I Q’ ! I Q  I Q ² Q’. In dual form: The Homomorphism Theorem: Chandra & Merlin – 1977 For instances I and I’, the following are equivalent:  There is a homomorphism h: I ! I’  I’ ² Q I  Q I’ µ Q I

29 Homomorphisms, CQE, and CQC The Homomorphism Theorem: Chandra & Merlin – 1977 For Boolean CQs Q and Q’, the following are equivalent: 1.Q µ Q’ 2.There is a homomorphism h: I Q’ ! I Q 3.I Q ² Q’. Proof: 1. ) 2. Assume Q µ Q’. Since I Q ² Q, we have that I Q ² Q’. Hence, by the Magic Lemma, there is a homomorphism from I Q’ to I Q. 2. ) 3. It follows from the other direction of the Magic Lemma. 3. ) 1. Assume that I Q ² Q’. So, by the Magic Lemma, there is a homomorphism h: I Q’ ! I Q. We have to show that if J ² Q, then J ² Q’. Well, if J ² Q, then (by the Magic Lemma), there is a homomorphism h’: I Q ! J. The composition h’ ± h: I Q’ ! J is a homomorphism, hence (once again by the Magic Lemma!), we have that J ² Q’.

30 Illustrating the Homomorphism Theorem Example:  Q: 9 x 1 9 x 2 9 x 3 9 x 4 (E(x 1,x 2 ) Æ E(x 2,x 3 ) Æ E(x 3,x 4 ))  Q’ : 9 x 1 9 x 2 9 x 3 (E(x 1,x 2 ) Æ E(x 2,x 3 )) Then: Q µ Q’ Homomorphism h: I Q’ ! I Q with h(x 1 ) = x 1, h(x 2 ) = x 2, h(x 3 ) = x 3. Q’ ⊈ Q (why?).

31 Illustrating the Homomorphism Theorem Example:  Q : 9 x 1 9 x 2 (E(x 1,x 2 ) Æ E(x 2,x 1 ))  Q’: 9 x 1 9 x 2 9 x 3 9 x 4 (E(x 1,x 2 ) Æ E(x 2,x 1 ) Æ E(x 2,x 3 ) Æ E(x 3,x 2 ) Æ E(x 3,x 4 ) Æ E(x 4,x 3 ) Æ E(x 4,x 1 ) Æ E(x 1,x 4 )) Then: Q µ Q’ Homomorphism h: I Q’ ! I Q with h(x 1 ) = x 1, h(x 2 ) = x 2, h(x 3 ) = x 1, h(x 4 ) = x 2. Q’ µ Q Homomorphism h’: I Q ! I Q’ with h’(x 1 ) = x 1, h(x 2 ) = x 2.  Hence, Q ´ Q’.

32 Illustrating the Homomorphism Theorem Example: 3-Colorability For a graph G=(V,E), the following are equivalent:  G is 3-colorable  There is a homomorphism h: G ! K 3  K 3 ² Q G  Q K 3 µ Q G.

33 The Homomorphism Theorem for non-Boolean Conjunctive Queries So far, we have focused on Boolean conjunctive queries. However, the Homomorphism Theorem easily extends to conjunctive queries of arbitrary arities by considering homomorphisms that are the identity on the variables in the head of the conjunctive queries (written as rules). Moreover, the Homomorphism Theorem also extends to conjunctive queries with constants in some of the conjuncts. Find all cities one can reach by flying from San Jose with two stops {x: 9 x 1 9 x 2 (FLIGHT(SanJose, x 1 ) Æ FLIGHT(x 1,x 2 ) Æ FLIGHT(x 2,x))}.

34 The Homomorphism Theorem for non-Boolean Conjunctive Queries The Homomorphism Theorem: Chandra & Merlin – 1977 Consider two k-ary conjunctive queries Q(x 1,…,x k ) :-- R 1 (u 1 ), …, R n (u n ) and Q’(x 1,…,x k ) :-- T 1 (v 1 ), …, T m (v m ), Then the following are equivalent:  Q µ Q’  There is a homomorphism h: I Q’ ! I Q such that h(x 1 ) = x 1, h(x 2 ) = x 2, …, h(x k ) = x k.  I Q, x 1,…, x k ² Q’.