# Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

## Presentation on theme: "Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh."— Presentation transcript:

Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008

Provenance and Query Optimization Many kinds of semiring-based provenance annotations to choose from: – lineage – why-provenance – minimal witness why-provenance – provenance polynomials –... These seem to keep track of more/less information A fundamental question: how does this affect query optimization? 2

Conjunctive Queries on K-Relations Datalog-style syntax for conjunctive queries (CQs): Q(x,y) :- R(x,z), R(z,y) Semantics of applying the CQ to a K-relation R : D £ D K: Q(a,b) = z 2 D R(a,z) ¢ R(z,b) # of repetitions of an atom in the body matters For unions of conjunctive quereis (UCQs) (equivalent to positive RA), sum over CQs: P(x,y) :- R(x,z), R(z,y) P(x,y) :- R(x,w), R(y,w) Semantics of UCQ applied to R a sum over CQs: P(a,b) = z 2 D R(a,z) ¢ R(z,b) + w 2 D R(a,w) ¢ R(b,w) 3

Choice of K Affects Query Optimization K = N (bag semantics) differs from K = B (set semantics) e.g., the conjunctive queries Q 1 (x) :- R(x,y), R(x,z) Q 2 (u) :- R(u,v) are set-equivalent, but not bag-equivalent 4 Conjunctive Queries (CQs) Unions of Conjunctive Queries (UCQs) Bag Semantics Containment ( v N ) ? ( ¦ 2 p -hard) [Chaudhuri&Vardi 93] undecidable [Ioannidis&Ramakrishnan 95] Bag Semantics Equivalence ( ´ N ) isomorphism ( ) [CV 93] ?

Our Contributions We make a systematic study of query containment and query equivalence for various provenance models We show that K-containment and K-equivalence of CQs and UCQs are decidable for lineage, why- provenace, and the provenance polynomials N [X], as well as a new model, B [X] The decision procedures are based on interesting variations of containment mappings We analyze the complexity in each case 5

Our Contributions As a corollary of the decidability result for N [X]-equivalence of UCQs, we also fill in a gap in the chart for bag semantics: 6 Conjunctive Queries (CQs) Unions of Conjunctive Queries (UCQs) Bag Semantics Containment ( v N ) ? ( ¦ 2 p -hard) [Chaudhuri&Vardi 93] undecidable [Ioannidis&Ramakrishnan 95] Bag Semantics Equivalence ( ´ N ) isomorphism ( ) [CV 93] isomorphism ( )

K-Containment for Queries For semiring K, define a · K b, 9 c. a + c = b. If · K is a partial order, it is called the natural order, and K is said to be naturally-ordered B, N, lineage, why-provenance, B [X], and N [X] are all naturally-ordered We define K-containment using the natural order: Q 1 v K Q 2,8 I 8 t Q 1 (I)(t) · K Q 2 (I)(t) Q 1 ´ K Q 2,8 I 8 t Q 1 (I)(t) = Q 2 (I)(t) 7

A Hierarchy of Semiring Provenance (1) Provenance polynomials ( N [X], +, ¢, 0, 1) – tracks calculations abstractly; most general e.g., 2p 2 r + 3ps + ps 3 Drop coefficients to get ( B [X], +, ¢, 0, 1) p 2 r + ps + ps 3 Drop exponents to get why-prov. ( P ( P (X)), [, d, ;, { ; }) {{p,r}, {p,s}} Flatten set-of-sets to get lineage ( P (X), +, ¢, ?, ; ) {p,r,s} Drop, flatten, etc. correspond to surjective semiring homomorphisms 8

A Hierarchy of Semiring Provenance (2) Suppose h : K 1 K 2 is a semiring homomorphism. Then a · K 1 b implies h(a) · K 2 h(b). If h is also surjective, then h(a) · K 2 h(b) implies a · K 1 b. Definition: K 1 ¹ K 2 means P v K 2 Q implies P v K 1 Q Proposition: for any positive K B ¹ K ¹ N [X] (All those we consider are positive.) Moreover: Proposition (Provenance Hierarchy): B ¹ lineage ¹ Why-Prov. ¹ B [X] ¹ N [X] 9

Containment Mappings A containment mapping from CQ Q to CQ P is a function h : Vars(Q) Vars(P) such that – head of Q is mapped to head of P – every atom in body of Q is mapped to an atom in body of P Theorem [CM77]: For CQs P,Q we have P v B Q iff there is a containment mapping from Q to P – e.g. Q 1 (x) :- R(x,y), R(x,z) Q 2 (u) :- R(u,v) – h which sends u x and v y is a containment mapping Checking for existence of containment mapping is NP-complete 10

Canonical Databases Take body of CQ, freeze into database instance [CM77], and tag each tuple with a tuple id Well denote by can K (Q) the canonical database for Q with abstract tags from K e.g., Q(w) :- R(u,v), R(v,w) uvx1x1 vwx2x2 can N [X] (Q) = can B [X] (Q) = R uv{x1}{x1} vw{x2}{x2} can lin (Q) = R uv{{x 1 }} vw{{x 2 }} can why (Q) = R 11

Lineage-Containment of CQs Covering set of containment mappings: for every atom A in the body of P there is a containment mapping h : Q P with A in the image of h Theorem: For CQs P, Q the following are equivalent: 1. P v lin Q 2. P(can lin (P)) µ lin Q(can lin (P)) 3.there is a covering set of containment mappings from Q to P Note: covering sets of containment mappings were identified in [CV 93] as a necessary (but not sufficient) condition for bag-containment of CQs 12

Why-Containment of CQs A containment mapping is onto if it induces a surjection on atoms Theorem: For CQs P, Q the following are equivalent: 1. P v why Q 2. P(can why (P)) µ why Q(can why (P)) 3.there is an onto containment mapping h : Q P Note: onto containment mappings were identified in [CV 93] as a sufficient (but not necessary) condition for bag-containment of CQs 13

B [X], N [X]-containment of CQs A containment mapping is exact if it induces a bijection on atoms Theorem: For CQs P, Q and for K 2 { B [X], N [X]} the following are equivalent 1. P v K Q 2. P(can K (P)) µ K Q(can K (P)) 3.there is an exact containment mapping h : Q P Another way to think of exact containment mappings: by unifying variables in Q, you get a query isomorphic to P 14

So Far K-containment of CQs is decidable for all the provenance models in the hierarchy Next, we indicate which steps in the hierarchy are strict, and which collapse: B Á lineage Á Why-Prov. Á B [X] ¼ N [X] 15

Separating the Models for v of CQs B Á lineage: Q 1 (x,y) :- R(x,y), R(x,z) Q 2 (x,y) :- R(x,y) Q 1 v B Q 2 but Q 1 v lin Q 2 lineage Á why: Q 1 (x) :- R(x,y), R(x,z) Q 2 (x) :- R(x,y) Q 1 v lin Q 2 but Q 1 v why Q 2 why Á B [X]: Q 1 (x,y) :- R(x,y)Q 2 (x,y) :- R(x,y), R(x,z) Q 1 v why Q 2 but Q 1 v B [X] Q 2 16

From Containment to Equivalence {Onto|exact} containment mappings in both directions implies CQs are isomorphic, so why-provenance, B [X], and N [X] collapse to: P ´ why Q, P ´ B [X] Q, P ´ N [X] Q, P Q In contrast, for lineage, having sets of covering containment mappings in both directions does not imply isomorphism (but still decidable) 17

From CQs to UCQs For idempotent semirings (where + is idempotent) this is easy. B, PosBool(B), lineage, why-provenance, and B [X] are idempotent; N [X] is not (omitted) Proposition [after SY80]: If K is idempotent, then for UCQs P, Q we have P v K Q iff for every CQ P in P there is a CQ Q in Q such that P v K Q Corollary: For idempotent K, the problems of checking K-equivalence of CQs and K-equivalence of UCQs are polynomially equivalent 18

N [X]- and Bag-Equivalence of UCQs As with CQs, N [X]-equivalence of UCQs turns out to be the same as isomorphism: Theorem: For UCQs P, Q, P ´ N [X] Q iff P Q But, it turns out that N [X]-equivalence and N - equivalence of UCQs are intimately related: Theorem: for UCQs P, Q, P ´ N [X] Q iff P ´ N Q Thus: Corollary: for UCQs P, Q P ´ N Q iff P Q 19

Theorem: checking for {covering set of|onto|exact} containment mappings is NP-complete Checking for query isomorphism: believed >P, <NP Summary: Complexity Results 20 B PosBool(B) N LineageWhy-Pr. B[X]B[X] N[X]N[X] CQs vKvK NP [CM 77] NP [PODS 07] ? ( ¦ 2 p -hard) [CV 93] NP-ct ´K´K NP ibid. NP ibid. ibid. NP-ct UCQs vKvK NP [SY 80] NP ibid. undec [IR 95] NP-ct PSPACE ´K´K NP ibid. NP ibid. NP-ct

Summary: Provenance Hierarchy 21 B PosB.(B)Lineage N Why-Pr. B[X]B[X] N[X]N[X] CQs vKvK ¼ÁÁÁÁ¼ ´K´K ¼ÁÁ¼¼¼ B PosB.(B)LineageWhy-Pr. B[X]B[X] N[X]N[X] UCQs vKvK ¼ÁÁÁÁ ´K´K ¼ÁÁÁÁ

Related Work Already mentioned – Set-cont. and equiv. of CQs [Chandra&Merlin 77] – Set-cont. and equiv. of UCQs [Sagiv&Yannakakis 80] – Bag-cont. of UCQs [Ioannidis&Ramakrishnan 95] – Bag-equiv. of CQs [Chaudhuri&Vardi 93] Containment of CQs with where-provenance [Tan 03] Bag-set semantics [CV 93], combined semantics [Cohen 06] – For K-relations: support operator of [Geerts&Poggi 08] generalizes duplicate elimination Bag-containment of CQ s [Jayram+ 06] 22

Future Work Loose ends: – Lower bound for N [X]-containment of UCQs (we gave only a PSPACE upper bound) – Generalize results for specific semirings to semirings with certain properties? Beyond UCQs: Datalog – is K-containment of Datalog programs the same as set- containment when K is a distributive lattice? – is bag-equivalence/ N [X]-equivalence undecidable for Datalog? Could semiring framework give any insight into bag- containment of CQs? Query optimization for annotated XML 23

24

N [X]-Containment of UCQs Surprisingly, the natural ideas based on exact containment mappings / canonical databases fail here – Pair each CQ P in P with a unique CQ Q in Q such that P v N [X] Q? Nope. – Test P(can N [X] (P)) µ N [X] Q(can N [X] (P))? Nope. However, can at least show the problem is decidable Theorem: if P is not N [X]-contained in Q, then P(I) * N [X] Q(I) for some abstractly-tagged N [X]-instance I containing at most |P| tuples This yields a PSPACE upper bound on the complexity – lower bound? 25

Minimal-Witness Why-Prov. [Bun.+ 01] Minimal-witness why-provenance [Bun.+ 01]: keep the set of sets of tuples minimal (throw out any member which contains another member) – {{prs}, {pq}, {rs}} ) {{pq}, {rs}} Turns out to be isomorphic to the semiring of positive Boolean formulae over variables B: (PosBool(B), Ç, Æ, >, ? ) [Val Tannen] Natural order corresponds to logical entailment: Á · PB Ã iff Á ² Ã Theorem [Bun.+ 01, PODS 07]: For UCQs P, Q we have P v PB Q iff P v B Q 26

Download ppt "Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh."

Similar presentations