Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Substitution.
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Query optimisation.
Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
ZMQS ZMQS
Reductions Complexity ©D.Moshkovitz.
1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres
ABC Technology Project
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Although, but, however All of these words join clauses in sentences, but they are different parts of speech. This presentation explains the impact of the.
Polynomial Factor Theorem Polynomial Factor Theorem
Functions.
Three Special Functions
Daniel Deutch Tel Aviv Univ. Tova Milo Tel Aviv Univ. Sudeepa Roy Univ. of Washington Val Tannen Univ. of Pennsylvania.
Do you have the Maths Factor?. Maths Can you beat this term’s Maths Challenge?
Modal Logic with Variable Modalities & its Applications to Querying Knowledge Bases Evgeny Zolin The University of Manchester
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Faster Query Answering in Probabilistic Databases using Read-Once Functions Sudeepa Roy Joint work with Vittorio Perduca Val Tannen University of Pennsylvania.
Chapter 5 Test Review Sections 5-1 through 5-4.
Query Answering for OWL-DL with Rules Boris Motik Ulrike Sattler Rudi Studer.
Addition 1’s to 20.
Checking  -Calculus Structural Congruence is Graph Isomorphism Complete Victor Khomenko 1 and Roland Meyer 2 1 School of Computing Science, Newcastle.
25 seconds left…...
Test B, 100 Subtraction Facts
Complexity ©D.Moshkovits 1 Where Can We Draw The Line? On the Hardness of Satisfiability Problems.
11 = This is the fact family. You say: 8+3=11 and 3+8=11
Week 1.
We will resume in: 25 Minutes.
Mathematics1 Mathematics 1 Applied Informatics Štefan BEREŽNÝ.
Lecture 15 Functions CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
1 Unit 1 Kinematics Chapter 1 Day
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
Equivalence Relations
© Imperial College LondonPage 1 Model checking and refinement checking for modal transition systems and their cousins MTS meeting 2007 Adam Antonik & Michael.
1 Graphs with Maximal Induced Matchings of the Same Size Ph. Baptiste 1, M. Kovalyov 2, Yu. Orlovich 3, F. Werner 4, I. Zverovich 3 1 Ecole Polytechnique,
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Containment of Nested XML Queries Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington.
Efficient Query Evaluation on Probabilistic Databases
Containment of Conjunctive Queries on Annotated Relations Todd J. Green University of Pennsylvania March 25, ICDT 09, Saint Petersburg.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Reconcilable Differences Todd J. GreenZachary G. IvesVal Tannen University of Pennsylvania March 24, ICDT 09, Saint Petersburg.
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania PODS 2007.
Querying Big Data by Accessing Small Data Wenfei FanUniversity of Edinburgh & Beihang University Floris GeertsUniversity of Antwerp Yang CaoUniversity.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Presentation transcript:

Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008

Provenance and Query Optimization Many kinds of semiring-based provenance annotations to choose from: – lineage – why-provenance – minimal witness why-provenance – provenance polynomials –... These seem to keep track of more/less information A fundamental question: how does this affect query optimization? 2

Conjunctive Queries on K-Relations Datalog-style syntax for conjunctive queries (CQs): Q(x,y) :- R(x,z), R(z,y) Semantics of applying the CQ to a K-relation R : D £ D K: Q(a,b) = z 2 D R(a,z) ¢ R(z,b) # of repetitions of an atom in the body matters For unions of conjunctive quereis (UCQs) (equivalent to positive RA), sum over CQs: P(x,y) :- R(x,z), R(z,y) P(x,y) :- R(x,w), R(y,w) Semantics of UCQ applied to R a sum over CQs: P(a,b) = z 2 D R(a,z) ¢ R(z,b) + w 2 D R(a,w) ¢ R(b,w) 3

Choice of K Affects Query Optimization K = N (bag semantics) differs from K = B (set semantics) e.g., the conjunctive queries Q 1 (x) :- R(x,y), R(x,z) Q 2 (u) :- R(u,v) are set-equivalent, but not bag-equivalent 4 Conjunctive Queries (CQs) Unions of Conjunctive Queries (UCQs) Bag Semantics Containment ( v N ) ? ( ¦ 2 p -hard) [Chaudhuri&Vardi 93] undecidable [Ioannidis&Ramakrishnan 95] Bag Semantics Equivalence ( ´ N ) isomorphism ( ) [CV 93] ?

Our Contributions We make a systematic study of query containment and query equivalence for various provenance models We show that K-containment and K-equivalence of CQs and UCQs are decidable for lineage, why- provenace, and the provenance polynomials N [X], as well as a new model, B [X] The decision procedures are based on interesting variations of containment mappings We analyze the complexity in each case 5

Our Contributions As a corollary of the decidability result for N [X]-equivalence of UCQs, we also fill in a gap in the chart for bag semantics: 6 Conjunctive Queries (CQs) Unions of Conjunctive Queries (UCQs) Bag Semantics Containment ( v N ) ? ( ¦ 2 p -hard) [Chaudhuri&Vardi 93] undecidable [Ioannidis&Ramakrishnan 95] Bag Semantics Equivalence ( ´ N ) isomorphism ( ) [CV 93] isomorphism ( )

K-Containment for Queries For semiring K, define a · K b, 9 c. a + c = b. If · K is a partial order, it is called the natural order, and K is said to be naturally-ordered B, N, lineage, why-provenance, B [X], and N [X] are all naturally-ordered We define K-containment using the natural order: Q 1 v K Q 2,8 I 8 t Q 1 (I)(t) · K Q 2 (I)(t) Q 1 ´ K Q 2,8 I 8 t Q 1 (I)(t) = Q 2 (I)(t) 7

A Hierarchy of Semiring Provenance (1) Provenance polynomials ( N [X], +, ¢, 0, 1) – tracks calculations abstractly; most general e.g., 2p 2 r + 3ps + ps 3 Drop coefficients to get ( B [X], +, ¢, 0, 1) p 2 r + ps + ps 3 Drop exponents to get why-prov. ( P ( P (X)), [, d, ;, { ; }) {{p,r}, {p,s}} Flatten set-of-sets to get lineage ( P (X), +, ¢, ?, ; ) {p,r,s} Drop, flatten, etc. correspond to surjective semiring homomorphisms 8

A Hierarchy of Semiring Provenance (2) Suppose h : K 1 K 2 is a semiring homomorphism. Then a · K 1 b implies h(a) · K 2 h(b). If h is also surjective, then h(a) · K 2 h(b) implies a · K 1 b. Definition: K 1 ¹ K 2 means P v K 2 Q implies P v K 1 Q Proposition: for any positive K B ¹ K ¹ N [X] (All those we consider are positive.) Moreover: Proposition (Provenance Hierarchy): B ¹ lineage ¹ Why-Prov. ¹ B [X] ¹ N [X] 9

Containment Mappings A containment mapping from CQ Q to CQ P is a function h : Vars(Q) Vars(P) such that – head of Q is mapped to head of P – every atom in body of Q is mapped to an atom in body of P Theorem [CM77]: For CQs P,Q we have P v B Q iff there is a containment mapping from Q to P – e.g. Q 1 (x) :- R(x,y), R(x,z) Q 2 (u) :- R(u,v) – h which sends u x and v y is a containment mapping Checking for existence of containment mapping is NP-complete 10

Canonical Databases Take body of CQ, freeze into database instance [CM77], and tag each tuple with a tuple id Well denote by can K (Q) the canonical database for Q with abstract tags from K e.g., Q(w) :- R(u,v), R(v,w) uvx1x1 vwx2x2 can N [X] (Q) = can B [X] (Q) = R uv{x1}{x1} vw{x2}{x2} can lin (Q) = R uv{{x 1 }} vw{{x 2 }} can why (Q) = R 11

Lineage-Containment of CQs Covering set of containment mappings: for every atom A in the body of P there is a containment mapping h : Q P with A in the image of h Theorem: For CQs P, Q the following are equivalent: 1. P v lin Q 2. P(can lin (P)) µ lin Q(can lin (P)) 3.there is a covering set of containment mappings from Q to P Note: covering sets of containment mappings were identified in [CV 93] as a necessary (but not sufficient) condition for bag-containment of CQs 12

Why-Containment of CQs A containment mapping is onto if it induces a surjection on atoms Theorem: For CQs P, Q the following are equivalent: 1. P v why Q 2. P(can why (P)) µ why Q(can why (P)) 3.there is an onto containment mapping h : Q P Note: onto containment mappings were identified in [CV 93] as a sufficient (but not necessary) condition for bag-containment of CQs 13

B [X], N [X]-containment of CQs A containment mapping is exact if it induces a bijection on atoms Theorem: For CQs P, Q and for K 2 { B [X], N [X]} the following are equivalent 1. P v K Q 2. P(can K (P)) µ K Q(can K (P)) 3.there is an exact containment mapping h : Q P Another way to think of exact containment mappings: by unifying variables in Q, you get a query isomorphic to P 14

So Far K-containment of CQs is decidable for all the provenance models in the hierarchy Next, we indicate which steps in the hierarchy are strict, and which collapse: B Á lineage Á Why-Prov. Á B [X] ¼ N [X] 15

Separating the Models for v of CQs B Á lineage: Q 1 (x,y) :- R(x,y), R(x,z) Q 2 (x,y) :- R(x,y) Q 1 v B Q 2 but Q 1 v lin Q 2 lineage Á why: Q 1 (x) :- R(x,y), R(x,z) Q 2 (x) :- R(x,y) Q 1 v lin Q 2 but Q 1 v why Q 2 why Á B [X]: Q 1 (x,y) :- R(x,y)Q 2 (x,y) :- R(x,y), R(x,z) Q 1 v why Q 2 but Q 1 v B [X] Q 2 16

From Containment to Equivalence {Onto|exact} containment mappings in both directions implies CQs are isomorphic, so why-provenance, B [X], and N [X] collapse to: P ´ why Q, P ´ B [X] Q, P ´ N [X] Q, P Q In contrast, for lineage, having sets of covering containment mappings in both directions does not imply isomorphism (but still decidable) 17

From CQs to UCQs For idempotent semirings (where + is idempotent) this is easy. B, PosBool(B), lineage, why-provenance, and B [X] are idempotent; N [X] is not (omitted) Proposition [after SY80]: If K is idempotent, then for UCQs P, Q we have P v K Q iff for every CQ P in P there is a CQ Q in Q such that P v K Q Corollary: For idempotent K, the problems of checking K-equivalence of CQs and K-equivalence of UCQs are polynomially equivalent 18

N [X]- and Bag-Equivalence of UCQs As with CQs, N [X]-equivalence of UCQs turns out to be the same as isomorphism: Theorem: For UCQs P, Q, P ´ N [X] Q iff P Q But, it turns out that N [X]-equivalence and N - equivalence of UCQs are intimately related: Theorem: for UCQs P, Q, P ´ N [X] Q iff P ´ N Q Thus: Corollary: for UCQs P, Q P ´ N Q iff P Q 19

Theorem: checking for {covering set of|onto|exact} containment mappings is NP-complete Checking for query isomorphism: believed >P, <NP Summary: Complexity Results 20 B PosBool(B) N LineageWhy-Pr. B[X]B[X] N[X]N[X] CQs vKvK NP [CM 77] NP [PODS 07] ? ( ¦ 2 p -hard) [CV 93] NP-ct ´K´K NP ibid. NP ibid. ibid. NP-ct UCQs vKvK NP [SY 80] NP ibid. undec [IR 95] NP-ct PSPACE ´K´K NP ibid. NP ibid. NP-ct

Summary: Provenance Hierarchy 21 B PosB.(B)Lineage N Why-Pr. B[X]B[X] N[X]N[X] CQs vKvK ¼ÁÁÁÁ¼ ´K´K ¼ÁÁ¼¼¼ B PosB.(B)LineageWhy-Pr. B[X]B[X] N[X]N[X] UCQs vKvK ¼ÁÁÁÁ ´K´K ¼ÁÁÁÁ

Related Work Already mentioned – Set-cont. and equiv. of CQs [Chandra&Merlin 77] – Set-cont. and equiv. of UCQs [Sagiv&Yannakakis 80] – Bag-cont. of UCQs [Ioannidis&Ramakrishnan 95] – Bag-equiv. of CQs [Chaudhuri&Vardi 93] Containment of CQs with where-provenance [Tan 03] Bag-set semantics [CV 93], combined semantics [Cohen 06] – For K-relations: support operator of [Geerts&Poggi 08] generalizes duplicate elimination Bag-containment of CQ s [Jayram+ 06] 22

Future Work Loose ends: – Lower bound for N [X]-containment of UCQs (we gave only a PSPACE upper bound) – Generalize results for specific semirings to semirings with certain properties? Beyond UCQs: Datalog – is K-containment of Datalog programs the same as set- containment when K is a distributive lattice? – is bag-equivalence/ N [X]-equivalence undecidable for Datalog? Could semiring framework give any insight into bag- containment of CQs? Query optimization for annotated XML 23

24

N [X]-Containment of UCQs Surprisingly, the natural ideas based on exact containment mappings / canonical databases fail here – Pair each CQ P in P with a unique CQ Q in Q such that P v N [X] Q? Nope. – Test P(can N [X] (P)) µ N [X] Q(can N [X] (P))? Nope. However, can at least show the problem is decidable Theorem: if P is not N [X]-contained in Q, then P(I) * N [X] Q(I) for some abstractly-tagged N [X]-instance I containing at most |P| tuples This yields a PSPACE upper bound on the complexity – lower bound? 25

Minimal-Witness Why-Prov. [Bun.+ 01] Minimal-witness why-provenance [Bun.+ 01]: keep the set of sets of tuples minimal (throw out any member which contains another member) – {{prs}, {pq}, {rs}} ) {{pq}, {rs}} Turns out to be isomorphic to the semiring of positive Boolean formulae over variables B: (PosBool(B), Ç, Æ, >, ? ) [Val Tannen] Natural order corresponds to logical entailment: Á · PB Ã iff Á ² Ã Theorem [Bun.+ 01, PODS 07]: For UCQs P, Q we have P v PB Q iff P v B Q 26