Presentation is loading. Please wait.

Presentation is loading. Please wait.

VLDB’2007 review Denis Mindolin. VLDB’07 program.

Similar presentations


Presentation on theme: "VLDB’2007 review Denis Mindolin. VLDB’07 program."— Presentation transcript:

1 VLDB’2007 review Denis Mindolin

2 VLDB’07 program

3

4 Outline Probabilistic Skylines on Uncertain Data, Jian Pei et al Lazy Maintenance of Materialized Views, Jingren Zhou et al

5 Probabilistic Skylines on Uncertain Data Based on the VLDB’07 paper of Jian Pei et al

6 Skyline. General picture For a dataset D = {p 1,..,p n }, the skyline S is the set of all p i s.t. there is no other p j that dominates p i p i dominates p j if p i is  better than p j in at least one dimension, and  not worse than p j in all other dimensions Single game results: S = {Eddie, Carl}

7 Uncertain data Multiple game results: S=? Use some aggregate function?  Can’t capture distribution!  Can be biased by outliers!

8 Probabilistic dominance relation Uncertain data Uncertain object U={u 1,..,u l } Uncertain objects are independent Pr(u i ) = Pr(u j ) Probabilistic dominance relation Given two uncertain objects U={u 1, …, u l1 }, V={v 1, …, v l2 } The prob. that V dominates U is given by

9 Probabilistic dominance relation. Example Smaller values of X and Y are better

10 p-Skyline Let U={u 1,…,u l }. For all u  U, probability of u in skyline := Probability u not dominated by any other object Skyline probability of U p-Skyline

11 The bottom up skyline algorithm Bounding  Compute upper and lower bounds of skyline prob. for objects Pruning  If the lower bound of Pr(U) is larger than p, then U is in the skyline. If the upper bound of Pr(U) is smaller than p, U is not in the skyline Refining  If p is between the lower and the upper bounds, then we need to get tighter bounds of the skyline probabilities by the next iteration of the algorithm

12 Bounding u min =(min i=1 {u i.D 1 },…,min{u i.D l }) u max =(max i=1 {u i.D 1 },…,max{u i.D l }) Lemma  If u i1 < u i2 then Pr(u i1 ) ≥ Pr(u i2 )  Pr(u min ) ≥ Pr(U) ≥ Pr(u max )

13 Pruning Rule1. For an uncertain object U and probability threshold p,  if Pr(U min ) < p, then U is not in the p-skyline.  If Pr(U max ) ≥ p, then U is in the p-skyline. Rule2. For each instance u  U, let Pr + (u) and Pr - (u) be the upper and lower bounds of Pr(u)  If, then U is not in the p-skyline  If, then U is in the p-skyline Rule3. Let U and V be two different uncertain objects. If u  U and V max < u, then Pr(u) = 0

14 Pruning Rule4. Let U and V be two uncertain objects and U’  U be a subset of instances of U such that U’ max  V min. If, then Pr(V) < p and thus V is not in the p-skyline

15 Refinement Partition instances into layers

16 Algorithm summary Complexity: O(W total *R)  W total – number of instances whose skyline probabilities are computed by the algorithm  R – average cost of querying local R-tree of possible dominating objects  W total is much smaller than the total number of instances Top-down algorithm: see the paper

17 Lazy Maintenance of Materialized Views Based on the VLDB’07 paper of Jingren Zhou et al

18 Eager and Deferred Materialized View Maintenance T1 V T2 Eager: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2), recomp(V)} Deferred: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2)} … User tran: {recomp(V)} … User tran: {Q(V)} Executed: {Q(V)}

19 Lazy Materialized View Maintenance T1 V T2 Lazy: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2)} … Executed: { recomp(V) } … User tran: {Q(V)} Executed: {Q(V)}

20 System architecture Based on MS SQL Server 2005

21 How it works

22 Delta tables Table 1 : {(transID i, stmtID i, rowID i, action i )} … Table n : {(transID i, stmtID i, rowID i, action i )}  tranID – transaction id  stmtID – statement id  rowID – updated row id  action = (ins|del)  All “update” actions are converted into pairs of del/ins actions

23 Maintenance and its optimization Maintenance task is created for each view affected by a transaction Views updated incrementally using Delta tables “Smart” maintenance task scheduler  Maintenance tasks are scheduled as low-priority jobs  Maintenance tasks are combined using the Condense operator  Proper times slot is allocated for each task

24 Delta stream Condense operator Intuition: Tran: {A:=1,…,A:=2,…,A:=3}=>{…,A:=3} Operator definition  INS/INS condense: {ins 1 (row a ), …, ins k (row a )}=>{…, ins k (row a )}  INS/DEL condense: {ins 1 (row a ), …, del k (row a )}=>{…}  DEL/DEL condense: {del 1 (row a ), …, del k (row a )}=>{…, del k (row a )}

25 Performance results Response time is low Query response time is low Maintenance cost  eager view update cost Overhead is low


Download ppt "VLDB’2007 review Denis Mindolin. VLDB’07 program."

Similar presentations


Ads by Google