Presentation is loading. Please wait.

Presentation is loading. Please wait.

A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.

Similar presentations


Presentation on theme: "A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1."— Presentation transcript:

1 A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1

2 Outline 1. Motivating Applications 2. The Probabilistic Data ModelChapter 2 3. Extensional Query PlansChapter 4.2 4. The Complexity of Query EvaluationChapter 3 5. Extensional EvaluationChapter 4.1 6. Intensional EvaluationChapter 5 7. Conclusions June, 2014Probabilistic Databases - Dan Suciu 2 Part 1 Part 2 Part 3 Part 4

3 Overview Review: Unions of Conjunctive Queries, UCQ Four simple rules for evaluating queries Q Big Dichotomy Theorem: 1. If the rules succeed  Q is safe  in PTIME 2. If the rules fail  Q is unsafe  #P-complete Compare to the Small Dichotomy Theorem, which applies only to conjunctive queries w/o self-joins: Case 1 holds precisely when Q is hierarchical Case 2 holds precisely when Q is not hierarchical June, 2014Probabilistic Databases - Dan Suciu 3

4 Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 4 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”:

5 Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 5 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”: Union of conjunctive queries

6 Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 6 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”: Union of conjunctive queries Same as: Q(z) = Owner(z,x) ∧∃ t [Location(x,t,”Office444”) ∨ Location(x,t,”Hall7”)]

7 Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 7 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”: Union of conjunctive queries Same as: Q(z) = Owner(z,x) ∧∃ t [Location(x,t,”Office444”) ∨ Location(x,t,”Hall7”)] 1.Distributivity law for ∨, ∧ 2.Commutativity law for ∃, ∨ : ( ∃ x P(x)) ∨ ( ∃ y T(y)) = ∃ z (P(z) ∨ T(z)) We will use these laws:

8 Four Rules for Computing Query Probabilities Independent join Independent project Independent union Inclusion/exclusion Rules apply to Boolean Queries only June, 2014Probabilistic Databases - Dan Suciu 8

9 June, 2014Probabilistic Databases - Dan Suciu 9 P(Q1 ∧ Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) Rule 1: Independent Join

10 June, 2014Probabilistic Databases - Dan Suciu 10 P(Q1 ∧ Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) P( ∃ z Q) = 1 – Π a ∈ Domain (1– P(Q[a/z]) If z is a “separator variable” in Q, meaning that for any constants a,b, Q[a/z] and Q[b/z] are independent Rule 1: Independent Join Rule 2: Independent Project

11 June, 2014Probabilistic Databases - Dan Suciu 11 P(Q1 ∧ Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) P( ∃ z Q) = 1 – Π a ∈ Domain (1– P(Q[a/z]) If z is a “separator variable” in Q, meaning that for any constants a,b, Q[a/z] and Q[b/z] are independent P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2)) Rule 1: Independent Join Rule 2: Independent Project Rule 3: Independent Union If Q1 and Q2 are independent (meaning: no common atoms)

12 Example June, 2014Probabilistic Databases - Dan Suciu 12 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )

13 Example June, 2014Probabilistic Databases - Dan Suciu 13 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )

14 Example June, 2014Probabilistic Databases - Dan Suciu 14 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )

15 Example June, 2014Probabilistic Databases - Dan Suciu 15 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 ) P(Q U ) = 1 – Π a ∈ Domain (1– P[(R(a) ∨ T(a)) ∧ ∃ y. S(a,y)] Distribute ∧ over ∨

16 Example June, 2014Probabilistic Databases - Dan Suciu 16 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 ) P(Q U ) = 1 – Π a ∈ Domain (1– P[(R(a) ∨ T(a)) ∧ ∃ y. S(a,y)] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∨ T(a)] P[ ∃ y. S(a,y)] Distribute ∧ over ∨ Independent join

17 Example June, 2014Probabilistic Databases - Dan Suciu 17 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 ) P(Q U ) = 1 – Π a ∈ Domain (1– P[(R(a) ∨ T(a)) ∧ ∃ y. S(a,y)] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∨ T(a)] P[ ∃ y. S(a,y)] Distribute ∧ over ∨ Independent join P(Q U ) = 1 – Π a ∈ Domain (1– (1-(1-P[R(a)])(1-P[T(a)])) (1-Π b ∈ Domain (1– P[S(a,b)])))

18 Rule 4: Inclusion-Exclusion June, 2014Probabilistic Databases - Dan Suciu P(Q1 ∧ Q2 ∧ Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∨ Q2) – P(Q1 ∨ Q3) – P(Q2 ∨ Q3) + P(Q1 ∨ Q2 ∨ Q3) 18 Note: this is the dual of the more popular formula: P(Q1 ∨ Q2 ∨ Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∧ Q2) – P(Q1 ∧ Q3) – P(Q2 ∧ Q3) + P(Q1 ∧ Q2 ∧ Q3)

19 Example June, 2014Probabilistic Databases - Dan Suciu 19 Q J = R(x 1 ),S(x 1,y 1 ), T(x 2 ),S(x 2,y 2 ) = [ ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 )] ∧ [ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )]

20 Example June, 2014Probabilistic Databases - Dan Suciu 20 Q J = R(x 1 ),S(x 1,y 1 ), T(x 2 ),S(x 2,y 2 ) = [ ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 )] ∧ [ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )] Q 1 = R(x 1 ),S(x 1,y 1 ) Q 2 = T(x 2 ),S(x 2,y 2 ) Q J = Q 1 ∧ Q 2 where

21 Example June, 2014Probabilistic Databases - Dan Suciu 21 Q J = R(x 1 ),S(x 1,y 1 ), T(x 2 ),S(x 2,y 2 ) = [ ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 )] ∧ [ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )] Q 1 = R(x 1 ),S(x 1,y 1 ) Q 2 = T(x 2 ),S(x 2,y 2 ) Q J = Q 1 ∧ Q 2 where P(Q J ) = P(Q 1 ) + P(Q 2 ) - P(Q 1 ∨ Q 2 ) Q 1 = a hierarchical conjunctive query w/o self-joins Q 2 = similar Q 1 ∨ Q 2 = Q U, which have see a couple of slides ago

22 Lesson 3 We need unions in order to handle self-joins! Conjunctive Queries = not a “natural” class of queries for Probabilistic DBs Unions of Conjunctive Queries = the “natural” class of queries June, 2014Probabilistic Databases - Dan Suciu 22

23 Unsafe Queries – When the Rules Fail 23 H 0 = R(x),S(x,y),T(y) June, 2014Probabilistic Databases - Dan Suciu

24 Unsafe Queries – When the Rules Fail 24 H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) June, 2014Probabilistic Databases - Dan Suciu = ∃ z [R(z) ∧ S(z,y 0 ) ∨ S(x 1,z) ∧ T(z)] Unlike Q U, here z occurs on different positions in S and we cannot apply Independent Project

25 Unsafe Queries – When the Rules Fail 25 H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) H 2 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),T(y 2 ) June, 2014Probabilistic Databases - Dan Suciu

26 Unsafe Queries – When the Rules Fail 26... H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) H 2 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),T(y 2 ) H 3 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 ) ∨ S 3 (x 3,y 3 ),T(y 3 ) June, 2014Probabilistic Databases - Dan Suciu

27 Unsafe Queries – When the Rules Fail 27 The proof is in [Dalvi&S, JACM’2012]... H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) H 2 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),T(y 2 ) Theorem. Each query H k is #P-hard H 3 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 ) ∨ S 3 (x 3,y 3 ),T(y 3 ) June, 2014Probabilistic Databases - Dan Suciu

28 The Amazing Queries H k 28 H 3 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 ) ∨ S 3 (x 3,y 3 ),T(y 3 ) H k is #P-hard. But if we drop any one conjunctive query, then it is in PTIME Independent union = ∃ z [S 2 (x 2,z),S 3 (x 2,z) ∨ S 3 (x 3,z),T(z)] = ∃ z [ ∃ x 3 S 3 (x 3,z)] ∧ [( ∃ x 2 S 2 (x 2,z)) ∨ T(z)] = etc June, 2014Probabilistic Databases - Dan Suciu

29 Where We Are We have seen examples of unsafe queries: H k But if a query Q has H k as a subquery, it is not necessarily unsafe When the four rules succeed, then Q is safe But inclusion/exclusion is insufficient: need to replace with Mobius inversion formula We will discuss these issues then state the Big Dichotomy Theorem June, 2014Probabilistic Databases - Dan Suciu 29

30 A Safe Query with H 1 as Subquery June, 2014Probabilistic Databases - Dan Suciu 30 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 )

31 A Safe Query with H 1 as Subquery Disconnected query = H 1 (unsafe!) June, 2014Probabilistic Databases - Dan Suciu 31 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 )

32 A Safe Query with H 1 as Subquery DNF CNF Disconnected query = H 1 (unsafe!) June, 2014Probabilistic Databases - Dan Suciu 32 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 ) Q V =[S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 )] ∧ [R(x 1 ),S(x 1,y 1 ) ∨ T(y 3 )]

33 A Safe Query with H 1 as Subquery DNF CNF = R(x 3 ) ∨ T(y 3 ) PTIME ! Disconnected query = H 1 (unsafe!) Inclusion/exclusion: June, 2014Probabilistic Databases - Dan Suciu 33 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 ) Q V =[S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 )] ∧ [R(x 1 ),S(x 1,y 1 ) ∨ T(y 3 )] P(Q V ) = P(q 1 ∧ q 2 )= P(q 1 ) + P(q 2 )-P(q 1 ∨ q 2 )

34 Inclusion/Exclusion is Insufficient Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ June, 2014Probabilistic Databases - Dan Suciu 34

35 Inclusion/Exclusion is Insufficient P(Q W ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 ) + - P(Q 1 ∨ Q 2 ) - P(Q 2 ∨ Q 3 ) – P(Q 1 ∨ Q 3 ) + P(Q 1 ∨ Q 2 ∨ Q 3 ) Also = H 3 = H 3 (hard !) June, 2014Probabilistic Databases - Dan Suciu 35 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */

36 #P-hard Inclusion/Exclusion is Insufficient P(Q W ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 ) + - P(Q 1 ∨ Q 2 ) - P(Q 2 ∨ Q 3 ) – P(Q 1 ∨ Q 3 ) + P(Q 1 ∨ Q 2 ∨ Q 3 ) Also = H 3 June, 2014Probabilistic Databases - Dan Suciu 36 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ PTIME = H 3 (hard !)

37 Inclusion/Exclusion is Insufficient June, 2014Probabilistic Databases - Dan Suciu 37 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ P(Q W ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 ) + - P(Q 1 ∨ Q 2 ) - P(Q 2 ∨ Q 3 ) – P(Q 1 ∨ Q 3 ) + P(Q 1 ∨ Q 2 ∨ Q 3 ) Also = H 3 #P-hard PTIME = H 3 (hard !)

38 August Ferdinand Möbius 1790-1868 Möbius strip Möbius function μ in number theory Generalized to lattices [Stanley’97,Rota’09] And now to queries ! June, 2014Probabilistic Databases - Dan Suciu 38

39 The CNF Lattice June, 2014Probabilistic Databases - Dan Suciu 39 Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book.

40 The CNF Lattice June, 2014Probabilistic Databases - Dan Suciu 40 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book. Example

41 The CNF Lattice Q1Q1 Q2Q2 Q3Q3 Q2∨Q3Q2∨Q3 Q1∨Q2Q1∨Q2 Q 1 ∨ Q 2 ∨ Q 3 (= Q 1 ∨ Q 3 ) =max(L) June, 2014Probabilistic Databases - Dan Suciu 41 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book. Example

42 The CNF Lattice Q1Q1 Q2Q2 Q3Q3 Q2∨Q3Q2∨Q3 Q1∨Q2Q1∨Q2 Q 1 ∨ Q 2 ∨ Q 3 (= Q 1 ∨ Q 3 ) =max(L) June, 2014Probabilistic Databases - Dan Suciu 42 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book. Example Nodes  in PTIME, Nodes  #P hard.

43 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 43

44 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 44 1

45 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 45 1

46 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 46 1 11

47 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 47 1 11 0

48 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu 1 Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 48 1 11 0

49 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu 2 1 Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 49 1 11 0

50 The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu 2 1 Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) New Rule Inclusion/Exclusion  Mobius’ Inversion Formula 50 1 11 0

51 The Big Dichotomy Theorem June, 2014Probabilistic Databases - Dan Suciu Dichotomy into PTIME/#P-complete based on “syntax” where “syntax” includes the Mobius function ! 51 Dichotomy Theorem Fix a UCQ query Q. 1.If rules terminates, then P(Q) is in PTIME 2.If rules fail, then P(Q) is #P-complete Dichotomy Theorem Fix a UCQ query Q. 1.If rules terminates, then P(Q) is in PTIME 2.If rules fail, then P(Q) is #P-complete The proof is in [Dalvi&S, JACM’2012]

52 Lesson 5 Four simple rules are all we need to compute query probabilities in PTIME: Independent join Independent project Independent union Inclusion/Exclusion  Mobius inversion formula Inclusion/exclusion is not used in modern model counting systems! It is specific to probabilistic databases June, 2014Probabilistic Databases - Dan Suciu 52

53 Representation Theorem Do we really need the lattice and Mobius function? Yes! For every lattice on can construct a query Q s.t.: Q is in PTIME if μ=0 Q is #P-complete if μ≠0 This suggests that using the Mobius function is unavoidable in Probabilistic Databases June, 2014Probabilistic Databases - Dan Suciu 53

54 Representation Theorem QWQW Examples: THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME 0 PTIME ! Q is in PTIME iff μ(, )=0 ! June, 2014Probabilistic Databases - Dan Suciu 54

55 Representation Theorem QWQW Q WW Examples: THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME 0 0 PTIME ! Q is in PTIME iff μ(, )=0 ! June, 2014Probabilistic Databases - Dan Suciu 55

56 Representation Theorem QWQW Q WW Q9Q9 Examples: THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME 0 0 0 PTIME ! Q is in PTIME iff μ(, )=0 ! June, 2014Probabilistic Databases - Dan Suciu 56

57 Landscape of Probabilistic Databases June, 2014Probabilistic Databases - Dan Suciu #P-hard PTIME Have safe plans Have approximate plans 57 QUQU QJQJ QVQV QWQW Q9Q9 H0H0 H1H1 H2H2 hierarchical H3H3 non-hierarchical

58 Extensional Plans for UCQ Recall extensional operators for Conjunctive Queries w/o self-joins Independent join: ⋈ Independent projectionΠ Selectionσ Now we need two more operators: Independent union: ∪ i Mobius sum: Σ μ1,μ2,μ3 June, 2014Probabilistic Databases - Dan Suciu 58

59 Independent-Union and Mobius-Sum June, 2014Probabilistic Databases - Dan Suciu ∪ AP a1p1 a2p2 a3p3 R(A) AP a2q2 a3q3 a4q4 T(A) AP a1p1 a21-(1-p2)(1-q2) a31-(1-p3)(1-q3) a4q4 SELECT 1.0 - (1.0 - (CASE WHEN R.p IS null THEN 0 ELSE R.p END))* (1.0 - (CASE WHEN S.p IS null THEN 0 ELSE S.p END)) FROM R full outer join S on r.x=s.x; 59 i

60 Independent-Union and Mobius-Sum June, 2014Probabilistic Databases - Dan Suciu ∪ AP a1p1 a2p2 a3p3 R(A) AP a2q2 a3q3 a4q4 T(A) AP a1p1 a21-(1-p2)(1-q2) a31-(1-p3)(1-q3) a4q4 SELECT 1.0 - (1.0 - (CASE WHEN R.p IS null THEN 0 ELSE R.p END))* (1.0 - (CASE WHEN S.p IS null THEN 0 ELSE S.p END)) FROM R full outer join S on r.x=s.x; 60 i Σ μ1,μ2,μ3 A AP a1p1 a2p2 a3p3 AP a2q2 a3q3 a4q4 AP a1s1 a3s3 AP a1μ1*p1+μ3*s1 a2μ1*p2+μ3*q2+μ3*s2 a3μ1*p3+μ3*q2+μ3*s3 a4μ3*q4 R(A) T(A) U(A) SELECT … -- long query -- here

61 Extensional Plans for UCQ June, 2014Probabilistic Databases - Dan Suciu ΠzΠz ΠxΠx S(x,y)R(z,x) ⋈x⋈x ΠzΠz ΠxΠx S(x,y)T(z,x) ⋈x⋈x ∪ Σ +1,-1,+1 ΠzΠz ΠxΠx S(x,y)R(z,x) ⋈x⋈x i ΠzΠz ΠxΠx S(x,y)T(z,x) ⋈x⋈x z z SELECT DISTINCT S.z FROM R r, S s1, T t, S s2 WHERE r.z = s.z and r.x = s1.x and t.z = s.z and t.x = s2.x SELECT DISTINCT S.z FROM R r, S s1, T t, S s2 WHERE r.z = s.z and r.x = s1.x and t.z = s.z and t.x = s2.x Can write back in SQL… … but won’t fit on one slide 61

62 Summary: Extensional Query Evaluation Four rules can evaluate all queries that are in PTIME Actually, a fifth rule is needed (ranking), see book Big Dichotomy Theorem: If the rules succeed  query is safe  in PTIME If the rules fail  query is unsafe  #P-complete Inclusion/exclusion is specific to probabilistic databases, not used by modern model counters: will discuss next. June, 2014Probabilistic Databases - Dan Suciu 62


Download ppt "A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1."

Similar presentations


Ads by Google