Download presentation

Presentation is loading. Please wait.

Published bySamantha Estridge Modified about 1 year ago

1
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1

2
Probabilistic Databases To model and query uncertain data (sensor networks, information extraction…) Possible worlds model – Each possible world W is a standard database instance, has a probability P [W] – Compact representation D assuming independence D a1a2a3a3a1a2a3a3 b1b1b2b3b1b1b2b a1a2a3a1a2a b1b2b3b1b2b R S T 2

3
Query Semantics Query Semantics on probabilistic databases: – Apply the query q on each possible world W – Add up the probabilities of the worlds that give the same query answer A P [q(D) = A] = ∑ W : q(W) = A P [W] Goal: Efficiently evaluate P [q(D) = A] – Data complexity; want time polynomial in n = |D| Can we always efficiently compute P [q(D)]? – NO, in general it is #P-hard 3

4
b1b2b3b1b2b3 u1u2u3u1u2u b1b2b3b1b2b a1a2a3a3a1a2a3a3 b1b1b2b3b1b1b2b3 v1v2v3v4v1v2v3v a1a2a3a3a1a2a3a3 b1b1b2b3b1b1b2b a1a2a3a1a2a3 w1w2w3w1w2w a1a2a3a1a2a Introduce event variables for tuples ( P [w 1 ] = 0.3, …) Step 1: Boolean provenance for q(D) [FR ’97, Z ’97] f = w 1 v 1 u 1 + w 2 v 2 u 1 + w 3 v 3 u 2 + w 3 v 4 u 3 Step 2: Compute P [q(D)] = P [f] given P [w 1 ] = 0.3, P [v 1 ] = 0.4, … 4 Probability Event variables Boolean query q():-R(x),S(x, y),T(y) easy hard Query Answering in Two Steps D R S T

5
Probability Computation for Positive Queries Dichotomy Result [DS ’04, ’07; DSS ’10] Given q as input, we can efficiently decide if q is – Safe: Safe plans run in poly-time on all instances, or, – Unsafe: #P-hard, e.g. q() :- R(x) S(x, y) T(y) Instance-by-instance approach [SDG ’10, RPT ’11] – Both q and D are given as input – Poly-time algorithm to compute P [q(D)] for special cases even if q is unsafe What about queries with difference ? 5

6
Boolean Provenances for Difference c1c1c2c3c1c1c2c3 a1a2a3a2a1a2a3a2 v1v2v3v4v1v2v3v4 a1a2a3a1a2a3 w1w2w3w1w2w3 RT 6 q 1 (x):- R(x, y), S(y, z) b1b2b1b1b2b1 c1c2c3c1c2c3 u1u2u3u1u2u3 q 2 (x):- R(x, y), S(y, z), T(z) b1b2b1b2 u 1 (v 1 + v 2 ) + u 3 v 4 u 2 v 3 b1b2b1b2 u 1 v 1 w 1 + u 1 v 2 w 2 + u 3 v 4 w 2 u 2 v 3 w 3 b1b2b1b2 (u 1 (v 1 + v 2 ) + u 3 v 4 ). (u 1 v 1 w 1 + u 1 v 2 w 2 + u 3 v 4 w 2 ) (u 2 v 3 ). (u 2 v 3 w 3 ) q = q 1 – q 2 S

7
Previous Work on Difference FOR ’11 – Framework for exact and approximate probability computation – But, no guarantee of polynomial running time In fact, we show in this paper that with difference, in some cases no approximation exists (unless NP = RP) How far can we go with difference in poly-time? 7

8
A Quick Comparison With difference DNF of boolean provenance may be exponential in n P [q(D)] may not be approximable Without difference DNF of boolean provenance is poly-size (n |q| ) P [q(D)] is always approximable ( FPRAS ) 8 FPRAS: F ully P olynomial R andomized A pprox. S cheme Compute with prob. ≥ ¾ in time polynomial in n, 1/ε p [(1-ε) P [q(D)], (1+ε) P [q(D)]

9
Our Results We study queries of the form q 1 – q 2 and their generalization – FPRAS: If q 1 is any UCQ, q 2 is any safe CQ - – #P-hardness: Even if both q 1 and q 2 are safe CQ - – Inapproximability: Even if q 1 is the trivial TRUE query and q 2 is a UCQ Our FPRAS result extends to a larger class of queries of which q 1 – q 2 is a special case [CQ - : Conjunctive queries without self-joins] 9

10
Difference Rank Define difference rank (q) of query q recursively – (R) = 0 – (q 1 - q 2 ) = (q 1 ) + (q 2 ) + 1 R – S : rank 1 – (q 1 ⋈ q 2 ) = (q 1 ) + (q 2 ) (R – S 1 ) ⋈ (R - S 2 ) : rank 2 (R - T 1 ) ⋈ T 2 : rank 1 – (q 1 q 2 ) = max ((q 1 ), (q 2 )) (R – S 1 ) ⋈ (R - S 2 ) (R - T 1 ) ⋈ T 2 : rank 2 – Select, project: rank remains the same 10

11
FPRAS for queries q with (q) = 1 given some conditions hold (inapproximable for (q) = 1 in general) 11

12
Steps in FPRAS Step 1: Compute boolean provenance of q[D] for any query q with (q) = 1 Step 2: Write the boolean provenance in a “Probability Friendly Form” (if possible) Step 3: FPRAS inspired by Karp-Luby framework 12

13
Boolean Provenance for Queries q s.t. (q) = 1 Lemma: For any q with (q) = 1, on any D, the provenance f of q(D) has form f is poly-size in n = |D|, poly-time computable 13

14
Probability Friendly Form (PFF) If f is in PFF, we can approximate P [f] using Karp-Luby Framework 14 f is in PFF, if the negated DNF-s can be written in poly-size d-DNNFs (next slide)

15
d-DNNF Darwiche ’01, ’02, DM ’02 deterministic - Decomposable Negation Normal Form No internal node can have negation At most one child of a + -node is satisfiable Children of a. -node do not share variables + + In general, can be a DAG Probability can be computed in linear time 15

16
Karp-Luby Framework [KL ’83] Given boolean expression DAGs F 1, …, F m f = F 1 + F F m P [f] can be computed in poly-time (in m, n) if in poly-time, i (1) P [F i ] can be computed (2) it can be checked if a given assignment satisfies F i (3) a random satisfying assignment of F i can be sampled Well-studied special case: DNF counting, where F 1, …, F m are DNF minterms: f = xyz + xyw + wuv 16

17
Conditions (1) and (2) hold for PFF Product of minterm and d-DNNF is another d-DNNF w 2 =1, z 1 =

18
Condition (3) also holds Lemma: Generating a random satisfying assignment on a d-DNNF can be done in poly-time Process in reverse topological order 2.Generate a random satisfying assignment bottom up v 2 = 1v 2 = 0 v 1 = 0 v 1 = 1 v 2 = 0 v 1 = 0, v 2 = 0 v 1 = 1, v 2 = 0 At random 18

19
Expressibility in PFF So, if f is in PFF, we can approximate P [q(D)] But, can we decide in poly-time if some sub-expressions of a boolean expression have poly-size d-DNNFs? Not known But, there are natural sufficient conditions that can be verified in poly-time – If certain sub-queries are safe and hence generate read- once expressions [OH ’08] – If sub-queries generate poly-size OBDDs [JS ’11] – Extends to instance-by-instance approach (both q, D given) 19

20
#P-hardness for q 1 - q 2 both q 1, q 2 are safe CQ - 20

21
#P-hardness: Steps in the proof “Hard” query q = q 1 – q 2 – q 1 () := R 1 (x, y 1 ) R 2 (x, y 2 ) R 3 (x, y 3 ) R 4 (x, y 4 ) – q 2 () := R 1 (x 1, y) R2(x 2, y) R 3 (x 3, y) R 4 (x 4, y) Counting independent sets in 3-regular bipartite graphs (XZ ’06) Counting edge covers in bipartite graphs of degree ≤ 4, where the edge set can be partitioned into 4 disjoint matchings 21

22
Other Related Work – Semantics of probabilistic query answering Fuhr-Rollecke ’97, Zimanyi ‘97 – Dichotomy of CQ -,CQ and UCQ queries Dalvi-Suciu ’04, ’07, Dalvi-Schnaitter-Suciu ’10 – Knowledge compilation techniques Olteanu-Huang ’08, Jha-Olteanu-Suciu ’10, Jha-Suciu ’11, Fink-Olteanu ’11 – Instance-by-instance approach Sen-Deshpande-Getoor ’10, Roy-Perduca-Tannen ’11 22

23
Conclusions and Future work A step towards understanding complexity of exact and approximate computation for queries with difference operations Future work – Dichotomy results that classify syntactically difference queries (similar to positive UCQ)? – Extending FPRAS to queries with difference rank > 1? – Experimental evaluation of our algorithms 23

24
Thank you Questions? 24

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google