指導教授：陳良弼 老師 報告者：鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work.

Presentation on theme: "指導教授：陳良弼 老師 報告者：鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work."— Presentation transcript:

 Introduction  Related Work  Problem Formulation  Future Work

 Top-k query on certain data ◦ Rank results according to a user-defined score ◦ Important for explore large databases ◦ E.g., top-2 = {T 1, T 2 } TIDPIDScore T1T1 A100 T2T2 B90 T3T3 C80 T4T4 D70

 Uncertain database ◦ How to define top-k on uncertain data? ◦ Mutually exclusive rules  E.g., T 1 ♁ T 4 TIDPIDScorePr. T1T1 A1000.2 T2T2 B900.9 T3T3 C800.6 T4T4 A700.8 …………

 C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, 2009. ◦ Causes:  Sensor networks, privacy, trajectories prediction… ◦ The main areas of research on the uncertain data:  Modeling of uncertain data  Uncertain data management  Top-k query, range query, NN query…  Uncertain data mining  Clustering, classification, frequent pattern, outliers…

 M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007. ◦ Possible Worlds

◦ U-Topk query  Return k tuples that can co-exist in a possible world with the highest probability  E.g., {T 1, T 2 } as U-Top2 ◦ U-kRanks query  Return k tuples each of which is a clear winner in its rank over all possible worlds  E.g., {T 2, T 6 } as U-2Ranks

s 1,1 = {t1} p = 0.4  U-Topk s 2,2 = {t1, t2} p = 0.28 s 1,2 = {t2} p = 0.42 s 2,3 = {t2, t5} p = 0.252 s 0,1 = {} p = 0.6 s 0,2 = {} p = 0.18 s 1,3 = {t2} p = 0.168 s 1,2 = {t1} p = 0.12 s 0,0 = {} p = 1 1 t1: 0.4 2 t2: 0.7 3 t5: 0.6 Storage Layer buffer: probability priority queue Complete! return {t1, t2} as top-2 Find U-Top2 query answer.

 U-kRanks i=1i=2 {} 1 {} 0.6 {} 0.18 Find U-2Ranks query answer. answer: ubound: 11 Storage Layer Report: t1: 0.4 {t1} 0.4 P t1,1 = 0.4 t1 0.4 0.6 t2: 0.7 {t2} 0.42 0.18 P t2,1 = 0.42 t2 0.42 top1: t2(0.42) top1top2 {t1} 0.12 {t1, t2} 0.28 0.54 P t2,2 = 0.28 t2 0.28 t5: 0.6 {} 0.072 {t5} 0.108 {t1} 0.048 {t1, t5} 0.072 {t2} 0.168 {t2, t5} 0.252 0.324 P t5,2 = 0.324 t5 0.324 t6: 1 {} 0 {t6} 0.072 {t1} 0 {t2} 0 {t5} 0 {t1, t6} 0.048 {t2, t6} 0.168 {t5, t6} 0.108 0.072 P t6,2 = 0.324 t6 0.324 top2: t6(0.324)

 M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, 2008. ◦ PT-k query  Return a set of all tuples whose top-k probability values are at least p  E.g., {T 1, T 2, T 5 } as PT-2 (with p=0.4)

 C. Jin, K. Yi, L. Chen, J. Yu, X. Lin. Sliding- Window Top-k Queries on Uncertain Streams. In VLDB, 2008. ◦ Applicable to those definitions of top-k above ◦ Maintain compact sets  A compact set of the window guarantees that tuples not in this compact set would not be the top-k answer of this window ◦ Both time- and space-efficient

 T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, 2009. ◦ The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k ◦ Return a number of typical vectors that efficiently sample the distribution of all potential top-k tuple vectors

 Example: ◦ In an International Tenpin Bowling Championship, the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

◦ U-Top3={T 2, T 5, T 6 } ◦ But U-Top2={T 1, T 2 }, U-Top1={T 1 } ◦ How about also considering {T 1, T 2, T 5 } as top-3? TIDPlayerPr. T1T1 A0.4100 T2T2 D0.6200 T3T3 B0.1400 T4T4 C0.3400 T5T5 C0.6600 T6T6 B0.8600 T7T7 D0.3800 T8T8 A0.5900 Possible WorldPr.Possible WorldPr. PW1T1, T2, T3, T40.0121PW9T2, T3, T4, T80.0174 PW2T1, T2, T3, T50.0235PW10T2, T3, T5, T80.0338 PW3T1, T2, T4, T60.0743PW11T2, T4, T6, T80.1070 PW4T1, T2, T5, T60.1443PW12T2, T5, T6, T80.2076 PW5T1, T3, T4, T70.0074PW13T3, T4, T7, T80.0107 PW6T1, T3, T5, T70.0144PW14T3, T5, T7, T80.0207 PW7T1, T4, T6, T70.0456PW15T4, T6, T7, T80.0656 PW8T1, T5, T6, T70.0884PW16T5, T6, T7, T80.1273

 We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C). ◦ Confidence: to express the top-(k-1) probabilities of the sets formed by k-1 tuples of this possible top-k answer  E.g., k=3 {T1, T2, T3} as a possible top-k with P=0.0356 C is composed in some way of Pr({T1, T2}) to be top-2=0.2542 and its confidence, Pr({T1, T3}) to be top-2=0.0218 and its confidence, Pr({T2, T3}) to be top-2=0.0512 and its confidence

 Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set. ◦ E.g., {T 1, T 3, T 5 }: P=0.8, C=0.4 {T 1, T 4, T 7 }: P=0.5, C=0.7 {T 2, T 6, T 7 }: P=0.3, C=0.2  this will not be returned

 Formulate the confidence function  Find an algorithm to generate the result set  Try to calculate the confidence in an efficient way  Carry out an empirical study on datasets

Download ppt "指導教授：陳良弼 老師 報告者：鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work."

Similar presentations