Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue.

Similar presentations


Presentation on theme: "Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue."— Presentation transcript:

1 Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue

2 A skeleton of scrubbing probabilistic database query results

3 Three probabilistic relation examples

4 Query 1: look for the year(s) where at least one movie was liked by people from northern regions The user gets the following answer from the probabilistic database: User: Where is the probability derived? System: It is based on the two assumptions: Pr(x 4 ) = 0.9 and Pr(x 5 ) = 0.2 User: I think the movie of MovieID = 4 is not actually liked by people from northern regions. Pr(x 4 ) should be 0.1 but not 0.9! System: The new probability is 0.28! How to identify the top-k uncertain assumptions for user clarification? How to recompute the probability?

5 Pr(ee) =Pr(x4 ∨ x5) =Pr(x4) + Pr(x5) – Pr(x4) * Pr(x5) =0.9 + 0.2 – 0.9 * 0.2 = 0.92 Top-k assumptions Pr(ee) =Pr(x 4 ∨ x 5 ) =Pr(x 4 ) + Pr(x 5 ) – Pr(x 4 ) * Pr(x 5 ) =0.1 + 0.2 – 0.1 * 0.2 = 0.28 0.1 EventIDProb.Rate x40.90.8 x50.20.1

6 Basic algorithm to compute top-k assumptions For an event expression ee, to compute its probability Pr(ee), one can first convert it into an equivalent disjunctive normal form, and then apply the inclusion-exclusion formula. disjunctive norm form: ee = C 1 ∨ C 2 ∨ … ∨ C m where C 1 = e 11 ∧ e 12 ∧ … ∧ e 1 s1, C 2 = e 21 ∧ e 22 ∧ … ∧ e 2 s2,..., C m = e m1 ∧ e m2 ∧ … ∧ e m sm, m ≥1, s1,s2,…,sm≥1 inclusion-exclusion formula:

7 Basic algorithm to compute top-k assumptions To computeone can rewrite Pr(ee) as Pr(ee)=α*Pr(e i )+β where α and β are two sub-expressions irrelevant to Pr(e i ) and The time complexity is O(2 m ), where m is the number of conjuncts in the disjunctive normal form of ee.

8 Optimization Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB Journal 16(4) (2007) 523–544 We restrict the event expression ee to the situation where basic events e1,e2, …, en are independent and moreover they do not occur repeatedly in ee, which can be obtained for most of the queries (80% of the TPC/H queries ) by using the well-researched optimization technique adopted in

9 Three probabilistic relation examples

10 Query 2: look for the year(s) where at least one movie was liked by people from northern regions but not by people from southern regions The user gets the following answer from the uncertain database:

11 ee=(e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(e 1 )=0.2 Pr(e 2 )=0.7 Pr(e 3 )=0.1 Pr(e 4 )=0.9 Pr(e 5 )=0.7 Pr(e 6 )=0.2 Pr(ee)? Pr(~ee) = 1 –Pr(ee) Pr(ee 1 ∧ ee 2 ) = Pr(ee 1 ) * Pr(ee 2 ) Pr(ee 1 ∨ ee 2 ) = Pr(ee 1 ) + Pr(ee 2 ) – Pr(ee 1 ) * Pr(ee 2 ) Pr(ee)=f(Pr(e 1 ),Pr(e 2 ),…,Pr(e 6 ))

12

13 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(e 1 )=0.2 Pr(e 2 )=0.7 Pr(e 3 )=0.1 Pr(e 4 )=0.9 Pr(e 5 )=0.7 Pr(e 6 )=0.2 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.7 =0.3 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.2*0.3 =0.06 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.06+0.01-0.06*0.01 =0.0694 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.1*0.1 =0.01 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.0694+0.56-0.0696*0.56 =0.591 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.2 =0.8 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.7*0.8 =0.56 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.9 =0.1

14 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(e 1 )=0.2 Pr(e 2 )=0.7 Pr(e 3 )=0.1 Pr(e 4 )=0.9 Pr(e 5 )=0.7 Pr(e 6 )=0.2 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.7 =0.3 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.2*0.3 =0.06 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.06+0.01-0.06*0.01 =0.0694 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.1*0.1 =0.01 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.0694+0.56-0.0696*0.56 =0.591 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.2 =0.8 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.7*0.8 =0.56 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.9 =0.1

15 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.7 =0.3 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.2*0.3 =0.06 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.06+0.01-0.06*0.01 =0.0694 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.1*0.1 =0.01 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.0694+0.56-0.0696*0.56 =0.591 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.2 =0.8 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.7*0.8 =0.56 Pr(ee(N)) =1-Pr(ee(leftChild(N))) =1-0.9 =0.1

16 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.2*0.3 =0.06 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.06+0.01-0.06*0.01 =0.0694 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.1*0.1 =0.01 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.0694+0.56-0.0696*0.56 =0.591 Pr(ee(N)) =Pr(ee(leftChild(N))) *Pr(ee(rightChild(N))) =0.7*0.8 =0.56

17 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.06+0.01-0.06*0.01 =0.0694 Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.0694+0.56-0.0696*0.56 =0.591

18 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) Pr(ee(N)) =Pr(ee(leftChild(N)))+Pr(ee(rightChild(N))) -Pr(ee(leftChild(N)))*Pr(ee(rightChild(N))) =0.0694+0.56-0.0696*0.56 =0.591

19 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 )

20 Second Optimization

21 (e 1 ∧ ~e 2 ) ∨ (e 3 ∧ ~e 4 ) ∨ (e 5 ∧ ~e 6 ) top-2 assumptions

22 Scrub the query result Recompute Pr((e 1 ∧~ e 2 ) ∨ (e 3 ∧~ e 4 ) ∨ (e 5 ∧~ e 6 )) with modified Pr(e 2 ) and pr(e 5 )

23 Performance Study

24

25 Conclusion


Download ppt "Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue."

Similar presentations


Ads by Google