Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB 2009 2010.

Similar presentations


Presentation on theme: "A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB 2009 2010."— Presentation transcript:

1 A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB 2009 2010. 2. 19 Presented by JongHeum Yeon

2 Probabilistic Databases  Motivation Increasing amounts of uncertain data Information retrieval, data integration and cleaning, text analytics, social network analysis  Probabilistic Databases Annotate tuples with existence probabilistics, and/or attribute values with probability distribution Interpretation according to the “possible worlds semantics” 2Copyright© 2010 by CEBTCenter for E-Business Technology

3 Possible World Semantics IDScoreProb t1t1 2000.2 t2t2 1500.8 t3t3 1000.4 3Copyright© 2010 by CEBTCenter for E-Business Technology IDScore t1t1 200 t2t2 150 t3t3 100 IDScore t1t1 200 t2t2 150 IDScore t2t2 150 t3t3 100 pw1 pw2 pw3 w.p. 0.064 A probabilistic table (assume tuple-independence) w.p. 0.096 w.p. 0.256

4 Possible World Semantics IDScoreProb t1t1 2000.2 t2t2 1500.8 t3t3 1000.4 4Copyright© 2010 by CEBTCenter for E-Business Technology IDScore t1t1 200 t2t2 150 t3t3 100 IDScore t1t1 200 t2t2 150 IDScore t2t2 150 t3t3 100 pw1 pw2 pw3 w.p. 0.064 A probabilistic table (assume tuple-independence) w.p. 0.096 w.p. 0.256 Score values are used to rank the tuples in every pw The top-1 answer for each possible world

5 Probabilistic Databases (cont’d) 5Copyright© 2010 by CEBTCenter for E-Business Technology

6 Top-k Queries  Multi-criteria optimization problem [score=100, Pr(t 1 )=0.5] vs. [score=50, pr(t 2 )=1.0]  Many Prior Proposals Return k tuples t with the highest score(t)Pr(t) [exp. score] Return the most probable top k-answer [U-top-k] – [Solimanet al. ’07] At rank i, return tuple with max. prob. Of being at rank i [U-rank-k] – [Solimanet al. ’07] Return k tuples t with the largest Pr(r(t)≤k) values [PT-k/GT-k] – [Huaet al. ’08] [Zhang et al. ’08] Return k tuples t with smallest expected rank:∑ pw Pr(pw)r pw (t) – [Cormodeet al. ’09] 6Copyright© 2010 by CEBTCenter for E-Business Technology

7 Top-k Queries (cont’d)  Which one should we use?  Comparing different ranking functions Normalized Kendall Distance between two top-k answers: – Penalizes #reversals and #mismatches – Lies in [0,1], 0: Same answers; 1: Disjoint answers 7Copyright© 2010 by CEBTCenter for E-Business Technology

8 Unified Approach  Define two parameterized ranking functions: PRF w ; PRF e that can simulate or approximate a variety of ranking functions PRF e much more efficient to evaluate (than PRF w ) 8Copyright© 2010 by CEBTCenter for E-Business Technology

9 Outline  Parameterized Ranking Functions  Computing PRF Independent Tuples Computing PRF e (α) x-Tuples Summery of Other Results  Approximating Ranking Functions  Experiments 9Copyright© 2010 by CEBTCenter for E-Business Technology

10 Parameterized Ranking Function  Weight Function  Parameterized Ranking Function (PRF)  Return k tuples with the highest values. 10Copyright© 2010 by CEBTCenter for E-Business Technology

11 Parameterized Ranking Function (cont’d)  ω(t,i)= 1 : Rank the tuples by probabilities  ω(t,i)=score(t): E-Score  PRF e (α): ω(i)= α i where α can be a real or a complex number  PRF ω (h): Generalizes PT/GT-k and U-Rank 11Copyright© 2010 by CEBTCenter for E-Business Technology

12 Computing PRF: Independent Tuples 12Copyright© 2010 by CEBTCenter for E-Business Technology

13 Computing PRF: Independent Tuples (cont’d) 13Copyright© 2010 by CEBTCenter for E-Business Technology

14 Computing PRF: Independent Tuples (cont’d) 14Copyright© 2010 by CEBTCenter for E-Business Technology

15 Computing PRF e (α): Independent Tuples     15Copyright© 2010 by CEBTCenter for E-Business Technology

16 Summary of Other Results  PRF computation for probabilistic and/xor trees Generalizes x-tuplesby allowing both mutual exclusivity and coexistence PRF computation : O(n3) PRF e computation : O(nlogn+nd) (d = height of the tree)  PRF computation on graphical models A polynomial time algorithm when the junction tree has bounded treewidth A nontrivial dynamic program combined with the generating function method. 16Copyright© 2010 by CEBTCenter for E-Business Technology

17 Approximating Ranking Functions 17Copyright© 2010 by CEBTCenter for E-Business Technology

18 Experiments: Approximation Ability 18Copyright© 2010 by CEBTCenter for E-Business Technology

19 Experiments: Execution Time 19Copyright© 2010 by CEBTCenter for E-Business Technology

20 Conclusion  Proposed a unifying framework for ranking over probabilistic databases through: Parameterized ranking functions Incorporation of user feedback  Designed highly efficient algorithms for computing PRF and PRF e  Developed novel approximation techniques for approximating PRF w with PRF e 20Copyright© 2010 by CEBTCenter for E-Business Technology


Download ppt "A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB 2009 2010."

Similar presentations


Ads by Google