Download presentation

Presentation is loading. Please wait.

Published byDasia Burlin Modified over 4 years ago

1
Faster Query Answering in Probabilistic Databases using Read-Once Functions Sudeepa Roy Joint work with Vittorio Perduca Val Tannen University of Pennsylvania 1

2
Probabilistic Databases 2 Possible worlds model Each possible world w is a standard database instance, has a probability P[w] Compact representation D based on independence assumptions Query Semantics in Probabilistic Databases (wlog.) Boolean query q Traditional database: q(D) {true, false} Probabilistic database: P[q(D)] = ∑ q(w) = true P[w] Goal: Efficiently evaluate P[q(D)] Data complexity; want time polynomial in n = |D|

3
Computation of P[q(D)] Can we efficiently compute P[q(D)]? NO, In general #P-hard DalviSuciu’04, ff. : Positive queries can be partitioned into Safe queries: Safe plans run in poly-time on all instances Unsafe queries: Data complexity is #P-hard Includes very simple queries like R(x) S(x, y) T(y) Given q as input, we can efficiently decide whether q is safe BUT: For unsafe queries, probabilities on some instances can be efficiently computed Our Approach: Take both q and D as input 3

4
Restrictions a1 a2 a3 b1 b2 b3 0.1 0.5 0.2 0.1 Tuple-independent representation D Tuple t annotated by P[t] a1 a2 a3 0.3 0.4 0.6 b1 b2 b3 0.7 0.8 0.4 RST a1b1a1b1 RST P[w] = 0.3 (1 – 0.4) (1 – 0.6) 0.1 (1 – 0.5) (1–0.2) (1–0.1) 0.7 (1–0.8) (1 – 0.4) w = a possible world D = Conjunctive query without self-join (CQ - ) q():= R(x)S(x, y)T(y) (This is the H 0 query from Suciu’s keynote) Probability

5
Query Answering in Two Steps: Example Event variables for tuples Step 1: Event expression for q(D) or “lineage” E = w1 v1 u1 + w2 v2 u1 + w3 v3 u2 + w3 v4 u3 The “form” of the expression depends on query plan; here () ((R ⋈ S) ⋈ T) Step 2: Compute P[q(D)] = P[E] given Pr[w1] = 0.3, Pr[v1] = 0.4, …. This work: take advantage of Read-Once expressions D a1 a2 a3 b1 b2 b3 v1 v2 v3 v4 0.1 0.5 0.2 0.1 a1 a2 a3 w1 w2 w3 0.3 0.4 0.6 b1 b2 b3 u1 u2 u3 0.7 0.8 0.4 R T S 5 Probability Event variables q():= R(x), S(x, y), T(y) EASY HARD a1 a2 a3 b1 b2 b3 0.1 0.5 0.2 0.1 a1 a2 a3 0.3 0.4 0.6 b1 b2 b3 0.7 0.8 0.4

6
Read-Once Boolean Expressions Expression in Read-once Form: Every variable occurs exactly once e.g. ((x+y)z + w)(u+v) Linear time probability computation P(x y) = P(x) P(y) P(x + y) = 1 – (1 -P(x)) (1 – P(y)) Read-once Expression: Has an equivalent read-once form. e.g. xzu + xzv + yzu + yzv + wu+ wv [in DNF, as large as O(n |q| )] xzu + xzv + (yz + w)(u+v) [not in DNF, can be much smaller] Non-read-once Expressions: No read-once form e.g.. xy + yz + zx, xy + yz + zw xy zuv 6

7
Read-Once Event Expressions Safe plans for safe queries directly produce expressions in read-once form (OlteanuHuang’08) Unsafe queries can also produce read-once expressions Our example is read-once E = w1 v1 u1 + w2 v2 u1 + w3 v3 u2 + w3 v4 u3 = (w1 v1 + w2 v2) u1 + w3 (v3 u2 + v4 u3) Corresponds to unsafe query q():= R(x) S(x, y) T(y) No query plan can produce the read-once form directly 7

8
Problem Definition Given a boolean CQ - query q, a tuple-independent database D, Can we efficiently decide whether the event expression corresponding to q(D) is read-once? If yes, can we compute the read-once form efficiently? (then P[q(D)] can be computed efficiently) 8

9
Read-once-ness: only a sufficient condition to efficiently compute P[q(D)] e.g., E = x1 x2 + x2 x3 + x3 x4 + …… Not read-once P[E] can be computed in poly-time using dynamic programming Moreover, see detailed analysis in JhaSuciu ’11 using OBDD, FBDD, d-DNNF E is read-once read-once form of E can be computed efficiently P[E] can be computed efficiently 9

10
Outline Background Existing characterization of read-once expressions Co-occurrence Graphs Our Contributions Co-table graph Step1. Computation of co-table graph Step2. Computation of read-once form Related work, Future work and Conclusion 10

11
Outline Background Existing characterization of read-once expressions Co-occurrence Graphs Our Contributions Co-table graph Step1. Computation of co-table graph Step2. Computation of read-once form Related work, Future work and Conclusion 11

12
Characterization of Read-once Expressions A positive boolean expression is read-once if and only if its “co-occurrence graph” is P4-free (no simple induced path with four vertices) and “normal”. Gurvich’ 77, ’ 91 Can be checked (and computed) in poly-time if the expression is given in DNF (GolumbicMR’ 06) z 12

13
Co-occurrence Graph - G CO Graph on variables in the expression as vertices 1. Express boolean expression in irredundant DNF xy + xyz + zx xy + zx 2. Put an edge between variables if they co-occur in a disjunct Can be easily computed if the expression is in DNF y x z 13

14
Outline Background Existing characterization of read-once expressions Co-occurrence Graphs Our Contributions Co-table graph Step1. Computation of co-table graph Step2. Computation of read-once form Related work, Future work and Conclusion 14

15
Our Contributions 1. DNF of event expression is not needed for CQ - G CO can be directly computed from “ provenance DAGs ” 2. We do not need to compute G CO A subgraph of G CO suffices – “ Co-table graph” G CT 15 Our Framework Compute G CO Use existing read-once testing algorithms Compute G CT Use our read-once testing algorithm (1) Uses Gurvich’s characterization vs. (2) Uses alternative (2) Is faster than (1) (1)(2)

16
Provenance DAG Event expressions, called “lineage” (Suciu keynote), are a form of provenance (GreenKarvounarakisT ’07). We use provenance DAGs (Green et. al. ’07) Query q():= R(x), S(x, y), T(y) Query Plan () ((R ⋈ S) ⋈ T) E = w1 v1 u1 + w2 v2 u1 + w3 v3 u2 + w3 v4 u3 w1w2w3 v1v2 v3 v4 u1 u2u3 16

17
Co-Table Graph -- G CT Subgraph of G co: |G CT | |G CO | Put an edge between variables only if their tables share variables in q e.g.: q():= R(x) S(y) R, S have n tuples each, G CO has n 2 edges, G CT has zero! q():= R(x) S(x, y) T(y) E = w1 v1 u1 + w2 v2 u1 + w3 v3 u2 + w3 v4 u3 w1 w2 w3 u1 u2 u3 v1 v2 v3 v4 w1 w2 w3 v1 v2 v3 v4 u1 u2 u3 G CO G CT 17

18
Our Algorithm Input: Provenance DAG, H Obtained from the query plan Step1: Compute G CT (the same procedure can compute G CO as well) Step2: Compute read-once form (if possible) Otherwise output that event expression is not read-once 18

19
Step1: Computing G CT Theorem: Two variables are adjacent in G CT if and only if their least common ancestor set contains a product-node in the provenance DAG yxZ E = xy + xz Proof uses critically the no-self-join assumption 19

20
Step2: Computing Read-once form Input: G CT Alternate between Row Decomposition and Table Decomposition Recursive computation Exactly one can be done at a recursion level, otherwise not read-once Proof uses critically no-union assumption Sound and Complete 20 q q q E1E1 E2E2 E3E3 E = E 1 + E 2 + E 3 Row decomposition q1q1 q2q2 E1E1 E2E2 E = E 1 E 2 Table decomposition

21
Example: Row Decomposition a1 a2 a3 b1 b2 b3 v1 v2 v3 v4 a1 a2 a3 w1 w2 w3 b1 b2 b3 u1 u2 u3 R ST q():= R(x), S(x, y), T(y) E = w1 v1 u1 + w2 v2 u1 + w3 v3 u2 + w3 v4 u3 w1 w2 w3 v1 v2 v3 v4 a1 a2 b1 v1 v2 a1 a2 w1 w2 b1u1 R1 S1T1 u1 u2 u3 + 21

22
Example: Table Decomposition w1 w2 v1 v2 a1 a2 b1 v1 v2 a1 a2 w1 w2 b1u1 R1 S1 T1 u1 q():= R(x), S(x, y), T(y) q1():= R(x), S(x, y1) q2():= T(y2) (w1 v1 + w2 v2) u1 (w1 v1 + w2 v2)u1 Final Expression: (w1 v1 + w2 v2)u1 + w3(v3 u2 + v4 u3) 22

23
Overall Time Complexity Input: Provenance DAG H Step1: Compute G CT or G CO Time complexity ≈ O(n m H + W H m CO ) m H = #edges in H, W H = width of H, m CO = #edges in G CO, m CT = #edges in G CT Step2: Compute read-once form (if possible) Using our algorithm: O((m CT + n) min (|q|, √ n)) ; Data complexity O(m CT + n) Using existing algorithms: O(m CO + n), m CT ≤ m CO 23 Summary Analysis uses “charging argument” Bound recursion depth, total time at each recursion level Step1 is more expensive Step2 is linear In |G CO | for existing algorithms In |G CT | for our algorithms |G CT | ≤ |G CO |

24
Outline Background Co-occurrence Graphs Existing characterization of read-once expressions Our Contributions Co-table graph Step1. Computation of co-table graph Step2. Computation of read-once form Related work, Future work and Conclusion 24

25
Related Work SenDeshpandeGetoor’ 10 Independent work, considers the same problem Shows that “normality” check is not needed for CQ - Tests P4-freeness using “lineage-trees” without computing the co-occurrence graph Our work: Computes the co-occurrence graph without DNF computation existing algorithms can be used. Was an open question in SenDeshpandeGetoor’10 Obtains a faster and simpler algorithm Time complexity comparison in the paper Uses BFS/DFS, easier to implement Uses compact provenance DAGs instead of lineage trees 25

26
Other Related Work Semantics of probabilistic query answering Fuhr-Rollecke ’97, Zimanyi ‘97 Dichotomy of CQ -,CQ and UCQ queries Dalvi-Suciu ’04, ’07, Dalvi-Schnaitter-Suciu ’10 Knowledge compilation techniques Olteanu-Huang ’08 Jha-Olteanu-Suciu ‘10 Jha-Suciu ’11 Fink-Olteanu ‘11 26

27
Conclusion and Future Work Can co-occurrence/co-table graph be computed as a pre-processing step? This is the more expensive step Akin to building indexes on databases but depends on query’s “join pattern” Cache the already computed G CT with the join pattern How to handle Larger classes of queries (UCQ?) and database models (disjoint independent?) Other efficient knowledge-compilation forms 27

28
Thank You. Questions? 28

Similar presentations

Presentation is loading. Please wait....

OK

Reductions Complexity ©D.Moshkovitz.

Reductions Complexity ©D.Moshkovitz.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google