Presentation is loading. Please wait.

Presentation is loading. Please wait.

Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]

Similar presentations


Presentation on theme: "Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]"— Presentation transcript:

1 Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]

2 Introduction: Uncertain Data Management Modeling Uncertain Data Possible Worlds Model Uncertain data management Top-k, Join, kNN, Skyline, Indexing, etc. Uncertain Data Mining Clustering, Classification, Frequent Pattern, Outlier Detection

3 Introduction: Data Representation A simple way to representing probabilistic data Each tuple has a confidence Pr(instance)= ∏ Pr(attendance) x ∏ Pr(absence) Mutual Exclusion Constraints for each tuple* Scoring function*

4 Introduction: Other Works K tuples that co-exist in a possible world U-Topk Returning tuples according to marginal distribution of top-k results U-kRanks and PT-k

5 Introduction: Other Works (Example)

6 Introduction: Other Works (drawback) The top-k result may be atypical The distribution of scores is not used

7 Introduction: c-Typical-Top k 3-Typical-Top 2 scores of this example is {118, 183, 235} Expected distance is 6.6 The vectors are {(t2, t6), (T7,T6), (T7,T3)}

8 Algorithm Distribution of top-2 tuples’ scores

9 Algorithm – Naïve approach INPUT: tuples with membership probabilities OUTPUT: Top-k scores distribution IDEA: recursively go through all possible worlds to calculate all probabilities, until reaching a threshold

10 Algorithm – a DP approach D(i,j): score distribution of top-j starting at Ti. The main problem is D(1,k) (?)

11 Algorithm – a DP approach Transformation: D(i,j) = TF[D(i+1,j),D(i+1,j-1)] D(i+1,j): For each (v,p) add (v, p(1-pi)) D(i+1,j-1): For each (v,p) add (v+si, p*pi) Merge duplicate items Bottom up DP Approximation

12 Handling More Real Scenarios Handling Mutually Exclusive Rules Compress the ME group Refine by lead tuple region Handling Ties When two tuples have the same score, rank them according to probability

13 Algorithm 3-Typical-Top 2 scores

14 c-Typical-Top k 3-Typical-Top 2 scores of this example is {118, 183, 235} Expected distance is 6.6 The vectors are {(t2, t6), (T7,T6), (T7,T3)}

15 Computing c-Typical-Top k Define F^a(j) to be the optimal objective over {sj, …, sn} where a is the number of typical scores. G^a(j) means the same

16 Computing c-Typical-Top k Just solve the two function optimization problem, using DP Boundary conditions

17 Empirical Study 3 -Typical VS U-Topk

18 Empirical Study

19

20 Q&A

21 Reference [1] Charu C. Aggarwal, Philip S. Yu “A Survey of Uncertain Data Algorithms and Applications”, IEEE Transactions on Knowledge and Data Engineering, 2009 [2] Tingjian Ge, Stan Zdonik, Samuel Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. SIGMOD, 2009


Download ppt "Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]"

Similar presentations


Ads by Google