Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.

Similar presentations


Presentation on theme: "Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world."— Presentation transcript:

1 Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world concept

2 Outline Motivation Traditional and Probabilistic Skyline Problem Definition Computation Problem and Algorithms (Top down and Bottom up) Experimental Results

3 Motivation Skyline Analysis on NBA players performance (#Rebounds) (#Assists) Uncertainty Each Player has multiple records “First read the topic and then the subtopic to let others know what you are doing” “instance e dominate b,d,c” Define“ skyline explanation of the graph, the larger the better”

4 Motivation Skyline Analysis on NBA players with multiple records

5 Easy Approach – Averaging Arbor (x) is better in assist than Eddy, but Eddy (point b) dominates all games of Arbor (x). Bob (point a) bias the aggregate value “not so fair to say Eddy is a worse in assist than Arbor” “not so fair to Bob to be severely affected by only a game” Complete-Miss: need a new graph

6 Olajuwon and Kobe Bryant are missing from Aggregate Skyline but present in Probabilistic Skyline Their performance vary a lot over games Details in experiment analysis Motivation Motivating result using Probabilistic Skyline Completed :(Miss: Pictures of them)

7 Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Dominance  Certain model: an object dominate another object with Probability 1.  Uncertain model: an object dominate another object with Probability P. g Certain DataUncertain Data Miss: A flash showing the calculation will be better “Assume smaller the value, the better”

8 Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Dominance  Certain model: an object dominate another object with Probability 1.  Uncertain model: an object dominate another object with Probability P. g Uncertain Data Miss: A flash showing the calculation will be better “Assume smaller the value, the better” Certain Data

9 Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Dominance  Certain model: an object dominate another object with Probability 1.  Uncertain model: an object dominate another object with Probability P. g Certain Data Miss: A flash showing the calculation will be better “Assume smaller the value, the better” Uncertain Data “Consider object d”

10 Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Dominance  Certain model: an object dominate another object with Probability 1.  Uncertain model: an object dominate another object with Probability P. g Certain Data Miss: A flash showing the calculation will be better “Assume smaller the value, the better” Uncertain Data

11 Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Dominance  Certain model: an object dominate another object with Probability 1.  Uncertain model: an object dominate another object with Probability P. g Certain Data Completed:Miss: A flash showing the calculation will be better “Assume smaller the value, the better” Uncertain Data

12 Probabilistic Skyline Calculation of Probability Object A dominating Object C “For easier illustration, discrete case are used” Miss: Need a flash to demonstrate the calculation of Dominance “Explanation of Symbols” Pr [A ≺ C] = 1/4*1/3 (4+..)

13 Probabilistic Skyline Calculation of Probability Object A dominates Object B “For easier illustration, discrete case are used” Miss: Need a flash to demonstrate the calculation of Dominance “Explanation of Symbols” Pr [A ≺ C] = 1/4*1/3 (4+4+..)

14 Probabilistic Skyline Calculation of Probability Object A dominates Object B “For easier illustration, discrete case are used” Completed:Miss: Need a flash to demonstrate the calculation of Dominance “Explanation of Symbols” Pr [A ≺ C] = 1/4*1/3 (4+4+0) =2/3

15 Probabilistic Skyline Probabilistic Skyline: From Dominance to Skyline Intuition of finding Skyline, probability of an object not to be dominated by other objects OK:Miss: using flash to do the grouping of object A,B,C We need a new measure …….. OK:Please change the equation of “0 <> (1/3)(1/3)” 0 (1/3)(1/3)

16 Probabilistic Skyline Probabilistic Skyline Idea Intuition  1) we know the dominance definition  2) skyline = not dominated by other objects Miss: not dominated demonstration of Object A,B “Consider Object A, instance by instance”

17 Probabilistic Skyline Probabilistic Skyline Idea Intuition  1) we know the dominance definition  2) skyline = not dominated by other objects Miss: not dominated demonstration of Object A,B “we see that instance of Object A is not dominated by instances of other objects”

18 Probabilistic Skyline Probabilistic Skyline Idea Intuition  1) we know the dominance definition  2) skyline = not dominated by other objects Miss: not dominated demonstration of Object A,B

19 Probabilistic Skyline Probabilistic Skyline Idea Intuition  1) we know the dominance definition  2) skyline = not dominated by other objects Miss: not dominated demonstration of Object A,B

20 Probabilistic Skyline Probabilistic Skyline Idea Intuition  Not dominated by other instances of objects, Probability of object A being dominated is 0. Probability skyline of object A is therefore 1. OK:Miss: not dominated demonstration of Object A,B

21 Probabilistic Skyline Calculation of Probabilistic Skyline Miss: another flash to show the calculation of Skyline Probability of an 7/12 ??: where to explain the consequence of an instance dorminated by an object Pr (D) ?

22 Probabilistic Skyline Calculation of Probabilistic Skyline Miss: another flash to show the calculation of Skyline Probability of an 7/12 ??: where to explain the consequence of an instance dorminated by an object Pr (D) ? Pr(d1) = (1-1/4)

23 Probabilistic Skyline Calculation of Probabilistic Skyline Miss: another flash to show the calculation of Skyline Probability of an 7/12 ??: where to explain the consequence of an instance dorminated by an object Pr (D) ? Pr(d1) = (1-1/4) Pr(d2) = (1-1/4) * (1-2/3)

24 Probabilistic Skyline Calculation of Probabilistic Skyline OK-Miss: another flash to show the calculation of Skyline Probability of an 7/12 ??: where to explain the consequence of an instance dorminated by an object Pr (D) ? Pr(d1) = (1-1/4) Pr(d2) = (1-1/4) * (1-2/3) Pr(d3) = (1-1/4) P(D) = 1/3(3/4+1/4+3/4) =7/12

25 Probabilistic Skyline The p-skyline 1-skyline  {A,B} 7/12 –skyline  {A,B,D} “If you have time, use the formula to find Object c probability as well”

26 Problem Definition Given a set of uncertain objects S and a probability threshold p (0 ≤ p ≤ 1), the problem of probabilistic skyline computation is to compute the p-skyline on S. 1-skyline  {A,B} 7/12 –skyline  {A,B,D}

27 Computation Problem of p-skyline First, each uncertain object may have many instances. We have to process a large number of instances. Second, we have to consider many probabilities in deriving the probabilistic skylines.

28 Algorithms (Top down and Bottom up) Data  Multiple records of objects in the hope of approximating the probability density function Techniques:  Bounding  Pruning  Refining “The whole algorithms are very detailed, technique authors use to efficient pruning will be discussed” “Assumption: the smaller the value, the better” “Please tell the audience clearly what is the data being processed”

29 Bottom-up Algorithm Technique – Minimum Bounding Box (MBB) OK:Miss: flash drawing the bounding box of object D and demonstrate the two property

30 Bottom-up Algorithm - Pruning Techniques (1/3) using Umin, Umax to decide membership of p-skyline For an uncertain object U and probability threshold p, if Pr(Umin) < p, then U is not in the p-skyline. If Pr(Umax) ≥ p, then U is in the p-skyline OK:Miss: Flash use figure 3 to illustrate

31 Bottom-up Algorithm - Pruning Techniques (2/3) using Umax to prune instances of objects Let U and V be uncertain objects such that U V. If u is an instance of U and Vmax ≺ u, then Pr(u) = 0. OK:Miss: Flash use equation ()()() to illustrate C2 is dominated by Umax, dominated by all instances in object D Pr(c2) = (1 – 3/3)(..)(..) = 0

32 Bottom-up Algorithm - Pruning Techniques (3/3) using subset of instance to prune objects OK: Better to use Flash illustration Estimate Pr(Vmin) upper bound by Pr(Umax’) ? How to say better Pr(Vmin) = (1 – |U’|/|U|)(..)(..) “ You can take min c{Pr(u)} for easy understanding” “ to estimate the upper bound of Vmin using U’ max assume all points of U appear only in U’ and green region, such that Vmin is dorminated by less objects If |U’| is large, more instances dominate Vmin, then Pr(Vmin) is low

33 Bottom-up Algorithm - Pruning Techniques (3/3) using subset of instance to prune objects Special Case  As a special case, if there exists an instance u ∈ U such that Pr(u) < p and u ≺ Vmin, then Pr(V ) < p and V can be pruned. Very useful: an uncertain object partially computed can be used to prune other objects

34 Bottom-up Algorithm simplified version of bottom-up algorithm If (u is dominated by another object) prune u //c2 is dominated by D end if If (u is Umin) compute Pr (Umin) if (Pr(Umin) < p) prune u //Umin < p end if Use Pr(u) to update Pr(U)’s upper and lower bound Decide membership of p-skyline of U prune other objects// check with other Umins End if Miss: Pictures of illustration “all instances of uncertain object are put into a list as well as the Umin ” Input: instances of objects and their Umin

35 Top-down Algorithm Difference between top down and bottom up algorithm Bottom up:  Start with single instance of an uncertain object Top down:  Start with the whole sets of instances of an uncertain object

36 The skyline probability of each subset of uncertain object can be bounded using its MBB. The skyline probability of the uncertain object can be bounded as the weighted mean of the bounds of subsets. Top-down Algorithm Idea of bounding Miss: if possible draw a graph with 4 squares inside it to replace the upper one

37 Top-down Algorithm supporting data structure : partition tree “for simplicity, a 2d tree will be used to illustrate the concept for easy understanding” Miss: the look of partition tree, with 2 dimension Miss: Mark the level of partition tree, 0,1,2 etc A A A B B B C D C D C D

38 Top-down Algorithm partition tree for bounding Compare the partition of U with other partition tree as follows: traverse the partition tree of other uncertain object V, in the depth-first manner. ??: Adding possible dominating object before discussing the algorithms **: wording needed to be changed if possible dominating object is mentioned A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’

39 Top-down Algorithm all possible situations during partition trees traversal A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’

40 Top-down Algorithm s ituations 1/3 during partition tree traversal for bounding calculation A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’

41 Top-down Algorithm s ituations 2/3 during partition tree traversal for bounding calculation (Place the two trees here, it is better to use subtree starting at level 1) A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’

42 Top-down Algorithm s ituations 3/3 during partition tree traversal for bounding calculation (Place the two trees here, it is better to use subtree starting at level 1) A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’ Estimate upper bound Estimate lower bound

43 Top-down Algorithm Pruning partition tree 1/3 “compare ABCD with B’ ” (better to put a tree here) A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’

44 Top-down Algorithm Pruning partition tree 2/3 (better to put a tree here) A B C D A B C D A B C D A’ B’ C’ D’ A’ B’ C’ D’ A’ B’ C’ D’

45 Top-down Algorithm Pruning partition tree 3/3 A B C D A B C D A B C D

46 Experiment Data and Experiment Experiment: aggregate skyline and probabilistic skyline (0.1-skyline) Data Set: NBA players performance record(339,721) Attributes: #points, #assists, #rebounds

47 Experiment Results 1) Top 12 players in probabilistic skyline also appear aggregate skyline 2) Players like (Olajuwon and Kobe Bryant) appear only in probabilistic skyline but not aggregate skyline. 3) Disagreement between probabilistic skyline and aggregate skyline. Player A dominate B in aggregate skyline but reverse in probabilistic skyline

48 Experiment

49 Experiment Results Analysis 2) Players like (Olajuwon and Kobe Bryant) appear only in probabilistic skyline but not aggregate skyline. Finding  Comparing to the aggregate skyline, the probabilistic skyline finds not only players consistently performing well, but also outstanding players with large variances in performance

50 Experiment Results Analysis 3) Disagreement between probabilistic skyline and aggregate skyline. Ewing(0.13577) has a higher skyline probability than Brand(0.10966), though Ewing is dominated by Brand in the aggregate data set Finding  Ewing play very well in few games  probabilistic skylines disclose interesting knowledge about uncertain data which cannot be captured by traditional skyline analysis.  Ranking can be performed on Probabilistic Skyline, which can not be done on aggregate skyline

51 Experiment Results Analysis

52 Other Experiments Synthesis data set Data  Synthesis data sets where instances of objects are generated in anti-correlated, independent, and correlated distributions

53 Other Experiment results Effect of probability threshold to size of skyline

54 Other Experiment results Effect of dimensionality to size of skyline

55 Other Experiment results Effect of cardinality (#instance) to size of skyline

56 Other Experiment results Scalability with respect to probability threshold

57 Other Experiment results Compare Top-Down and Bottom-Up with dimensionality and cardinality

58 The End


Download ppt "Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world."

Similar presentations


Ads by Google