Presentation is loading. Please wait.

Presentation is loading. Please wait.

指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work.

Similar presentations


Presentation on theme: "指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work."— Presentation transcript:

1 指導教授:陳良弼 老師 報告者:鄧雅文

2  Introduction  Related Work  Problem Formulation  Future Work

3  Top-k query on certain data ◦ Rank results according to a user-defined score ◦ Important for explore large databases ◦ E.g., top-2 = {T 1, T 2 } TIDPIDScore T1T1 A100 T2T2 B90 T3T3 C80 T4T4 D70

4  Uncertain database ◦ How to define top-k on uncertain data? ◦ Mutually exclusive rules  E.g., T 1 ♁ T 4 TIDPIDScorePr. T1T1 A T2T2 B900.9 T3T3 C800.6 T4T4 A700.8 …………

5  C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, ◦ Causes:  Sensor networks, privacy, trajectories prediction… ◦ The main areas of research on the uncertain data:  Modeling of uncertain data  Uncertain data management  Top-k query, range query, NN query…  Uncertain data mining  Clustering, classification, frequent pattern, outliers…

6  M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, ◦ Possible Worlds

7 ◦ U-Topk query  Return k tuples that can co-exist in a possible world with the highest probability  E.g., {T 1, T 2 } as U-Top2 ◦ U-kRanks query  Return k tuples each of which is a clear winner in its rank over all possible worlds  E.g., {T 2, T 6 } as U-2Ranks

8 s 1,1 = {t1} p = 0.4  U-Topk s 2,2 = {t1, t2} p = 0.28 s 1,2 = {t2} p = 0.42 s 2,3 = {t2, t5} p = s 0,1 = {} p = 0.6 s 0,2 = {} p = 0.18 s 1,3 = {t2} p = s 1,2 = {t1} p = 0.12 s 0,0 = {} p = 1 1 t1: t2: t5: 0.6 Storage Layer buffer: probability priority queue Complete! return {t1, t2} as top-2 Find U-Top2 query answer.

9  U-kRanks i=1i=2 {} 1 {} 0.6 {} 0.18 Find U-2Ranks query answer. answer: ubound: 11 Storage Layer Report: t1: 0.4 {t1} 0.4 P t1,1 = 0.4 t t2: 0.7 {t2} P t2,1 = 0.42 t top1: t2(0.42) top1top2 {t1} 0.12 {t1, t2} P t2,2 = 0.28 t t5: 0.6 {} {t5} {t1} {t1, t5} {t2} {t2, t5} P t5,2 = t t6: 1 {} 0 {t6} {t1} 0 {t2} 0 {t5} 0 {t1, t6} {t2, t6} {t5, t6} P t6,2 = t top2: t6(0.324)

10  M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, ◦ PT-k query  Return a set of all tuples whose top-k probability values are at least p  E.g., {T 1, T 2, T 5 } as PT-2 (with p=0.4)

11  C. Jin, K. Yi, L. Chen, J. Yu, X. Lin. Sliding- Window Top-k Queries on Uncertain Streams. In VLDB, ◦ Applicable to those definitions of top-k above ◦ Maintain compact sets  A compact set of the window guarantees that tuples not in this compact set would not be the top-k answer of this window ◦ Both time- and space-efficient

12  T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, ◦ The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k ◦ Return a number of typical vectors that efficiently sample the distribution of all potential top-k tuple vectors

13  Example: ◦ In an International Tenpin Bowling Championship, the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

14 ◦ U-Top3={T 2, T 5, T 6 } ◦ But U-Top2={T 1, T 2 }, U-Top1={T 1 } ◦ How about also considering {T 1, T 2, T 5 } as top-3? TIDPlayerPr. T1T1 A T2T2 D T3T3 B T4T4 C T5T5 C T6T6 B T7T7 D T8T8 A Possible WorldPr.Possible WorldPr. PW1T1, T2, T3, T PW9T2, T3, T4, T PW2T1, T2, T3, T PW10T2, T3, T5, T PW3T1, T2, T4, T PW11T2, T4, T6, T PW4T1, T2, T5, T PW12T2, T5, T6, T PW5T1, T3, T4, T PW13T3, T4, T7, T PW6T1, T3, T5, T PW14T3, T5, T7, T PW7T1, T4, T6, T PW15T4, T6, T7, T PW8T1, T5, T6, T PW16T5, T6, T7, T

15  We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C). ◦ Confidence: to express the top-(k-1) probabilities of the sets formed by k-1 tuples of this possible top-k answer  E.g., k=3 {T1, T2, T3} as a possible top-k with P= C is composed in some way of Pr({T1, T2}) to be top-2= and its confidence, Pr({T1, T3}) to be top-2= and its confidence, Pr({T2, T3}) to be top-2= and its confidence

16  Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set. ◦ E.g., {T 1, T 3, T 5 }: P=0.8, C=0.4 {T 1, T 4, T 7 }: P=0.5, C=0.7 {T 2, T 6, T 7 }: P=0.3, C=0.2  this will not be returned

17  Formulate the confidence function  Find an algorithm to generate the result set  Try to calculate the confidence in an efficient way  Carry out an empirical study on datasets

18


Download ppt "指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work."

Similar presentations


Ads by Google