Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.

Similar presentations


Presentation on theme: "CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC."— Presentation transcript:

1 CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC

2 4/14/20152 Problems Studied Efficient computation of relational queries. One of the earliest and pioneering works. Focus on logical optimization. Complexity issues investigated.

3 4/14/20153 Key contributions How can we (logically) optimize conjunctive queries? –Optimize = throw away redundant parts. –How can we detect/prove that some parts are redundant? –How hard is it to do so, in general, for relational (aka TRC/DRC/RA) queries? –How hard is it for CQ? –Is the minimized version of a (C)Q unique? An elegant tool for reasoning about CQ minimization.

4 4/14/20154 What are Conjunctive Queries? Simply put, RA queries involving select, project, join/product. In terms of logic, queries that involve conjunctions among database predicates and existential quantifiers. In terms of Datalog, queries (without recursion or negation) of the form p(X1, …, Xn)  q1(Y1, …, Yk), …, qm(Z1, …, Zp), where Xs must be among the Ys and Zs. Additional (built-in) predicates may be allowed. Constant arguments, comparisons with constants, between variables.

5 4/14/20155 Conjunctive Queries CM’77 uses (much) older notation using additional relational operators incl. generalized projection. We will explain their results using the much simpler Datalog notation. So, what is query minimization? Let Q(D) denote the output of query Q on input database D. Q1  Q2  Q1(D)  Q2(D) on all input DB D.

6 4/14/20156 What is query minimization? Q1  Q2  Q1(D) = Q2(D) on all input DB D. Note: –set semantics; works when we don’t care about multiplicity of occurrences. –Doesn’t work when we care about aggregates.  need bag semantics. Goal: Given Q, find a (logically) simpler expression Q’ s.t. Q’  Q. Usually achieved by looking for redundant parts of Q which can be tossed.

7 Motivation for CQ Containment select S.starName, M.title from Movie M, Movie N, starsIn S where M.title=S.title and N.year=M.year 4/14/20157 select S.starName, M.title from Movie M, Movie N, starsIn S where M.title=S.title Can be shown to be equivalent to Intuition? How does this help?

8 4/14/20158 Example of query minimization E.g.: q()  r(X1,X2), r(X3,X2), r(X5,X2), r(X1,X4), r(X3,X4), r(X5,X4), r(X1,X6), r(X3,X6), r(X5,X6). I.e., “do there exist tuples in r which fit the above pattern?”  a trivial O(n^6) algorithm, where n = #constants in the DB! Can show answer to this query is true iff r contains at least one tuple. If so, can evaluate q() in O(1) time.

9 4/14/20159 Example of query minimization Proof: Suppose q() is “true”. Trivially, r must be non-empty. Conversely, suppose r contains just one tuple r(1,2). Then by mapping X1, X3, X5  1 and X2, X4, X6  2, we can derive the answer “true”.  [] Lemma 1 in paper – illustrates some technical difficulties. Is  always an equivalence relation? How can we minimize queries? We have covered Sections 1-3, from a different perspective.

10 4/14/ A model of query eval. Q: p(X1, …, Xn)  q1(Y1, …, Yk), …, qm(Z1, …, Zp). Input DB: D. A valuation = a function that maps variables to constants in D and constants in Q if any, to themselves. Under a valuation, the atoms in Q’s body are true OR false in D. Q(D) = {  (p(X1, …, Xn)) |  is a valuation that makes all atoms in Q’s body true in D}.

11 4/14/ An example p(X,Y)  q(X,Z), r(Z,Y), q(X,W). D = {q(1,2), r(2,3)}.  : X  1, Y  3, Z  2, W  2.  makes Q’s body true in D. Is there any other such valuation? Q(D) = {p(1,3)}.

12 4/14/ Getting back to business Recall, we are trying to figure out what it takes for Q1  Q2, where both are CQs. Note: Q1  Q2 iff on every DB D, for every valuation  1 that makes Q1’s body true, there is a valuation  2 that makes Q2’s body true and further both map their respective heads to the same tuple.

13 4/14/ Key result of CM’77 Theorem: Q1  Q2 iff there exists a homomorphism from Q2 to Q1. Q1: p(X1,…,Xn)  g1(Y1,…,Ym), …, gj(Z1, …,ZK). Q2: p(U1,…,Un)  s1(V1,…,Vr), …, si(W1,…,Wt). What is a homomorphism? h: Vars(Q2)  Vars(Q1) s.t. h turns each atom in Q2’s body into an atom in Q1’s body and turns Q2’s head into Q1’s head.

14 4/14/ Example revisited Q: p(X,Y)  q(X,Z), r(Z,Y), q(X,W). Q’: p(X,Y)  q(X,Z), r(Z,Y). Claim: Q’  Q. Proof: Q  Q’ trivially (why?). Now, Q’  Q as well, since h: X  X, Y  Y, Z  Z, W  Z is a homomorphism from Q to Q’. [] Exercise: Show that the CQ on page 7 is equivalent to q()  r(X1,X2).CQ on page 7

15 4/14/ Proof of Homomorphism Theorem Proof of (  ): Suppose a homomorphism h exists from Q2 to Q1. Let D be any DB and p(a1, …, an) be a tuple in Q1(D). Let  be the valuation that bears witness to this. Consider the function  h: Vars(Q2)  Constants in D. It’s a valuation that makes Q2’s body true in D. Further,  h(Q2’s head) = p(a1,…, an).

16 4/14/ Proof (contd.) Proof of (  ): Suppose Q1  Q2. Then Q1(D)  Q2(D), on every input DB D. Make up a special DB by “freezing” the vars in Q1’s body. Think of each var as a distinct constant. Q1(D) contains p(x1,…xn) trivially. So does Q2(D). ==>  a val.  : Vars(Q2)  Constants in D that witnesses this. But the constants are frozen versions of Vars(Q1). Unfreeze them. Then  with constants unfrozen into vars is a homomorphism from Q2 to Q1. []

17 4/14/ Revisit our toy example Q: p(X,Y)  q(X,Z), r(Z,Y), q(X,W). Q’: p(X,Y)  q(X,Z), r(Z,Y). Frozen db D = {q(x,z), r(z,y)}. Q(D) contains p(x,y). The val. witnessing this is essentially a homomorphism. Note: The proof gives us a nice (if macabre) test for “Q1  Q2”? Freeze the body of Q1  a special db D. Then run Q2 on D and check whether Q2(D) contains the frozen head of Q1. The homomorphism is sometimes called a containment mapping (c.m.).

18 4/14/ Summing up CM’77 Key goal: Minimizing CQs by removing redundant subgoals (and hence joins). Main test – is Q1  Q2? Can check by looking for existence of a homomorphism, OR by running Q2 on the frozen DB, i.e., frozen body of Q1 and checking whether it contains the frozen head of Q1 (called chase).

19 4/14/ Summary (contd.) Paper contains a no. of complexity results. Most relevant to us are: –Testing containment of arbitrary CQs – NP- complete. Where does the complexity come from? –When no predicate repeats in the body of Q1 (smaller query), PTIME.

20 4/14/ Final Remarks CQ containment – fundamental to logical QO. CQs with all kinds of bells and whistles added – containment has been studied. Containment reasoning plays a pivotal role in answering queries using views.


Download ppt "CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC."

Similar presentations


Ads by Google