Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science.

Similar presentations


Presentation on theme: "Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science."— Presentation transcript:

1 Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science

2 M-Pref 2007, Vienna 23/9/2007 2 Forecast Problem of maintaining materialized top-k views, when updates occur in the base relation Extra difficulty: address the problem in the presence of high deletion rates The crux of the approach is to materialize an appropriate number of extra tuples kcomp to sustain the deletion rates that are drastically higher than average The correct estimation & fine tuning of kcomp is not obvious We use appropriate probabilistic methods

3 M-Pref 2007, Vienna 23/9/2007 3 Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

4 M-Pref 2007, Vienna 23/9/2007 4 Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

5 M-Pref 2007, Vienna 23/9/2007 5 Top-k query Given a relation R (id, x1, x2, x3) and a query Q, sum(x1, x2, x3) Find k tuples with highest grades according to Q idx1x2x3 a0.30.60.7 b0.20.30.4 c 0.50.9 d0.70.60.1 R Top-2 tuples sum 1.6 0.9 1.8 1.4

6 M-Pref 2007, Vienna 23/9/2007 6 Motivating Example Shopping Center Customers sign in with a palmtop (PDA) Need for advertisements – Special offers to Customers Given relation Customers (id, name, age, salary, …) materialized view V of the top-2 ( Younger and Highly paid Customers ) according to the query Q: - age + 2*salary Maintain the view V Customers sign in and out (e.g., train departures, working hours) idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q 22 8 44 17 nameQ Bill44 John22 CustomersV

7 M-Pref 2007, Vienna 23/9/2007 7 Problem definition Given a base relation R (ID, X, Y) that originally contains N tuples, a materialized view V that contains top-k tuples of the form (id, val) where val is the score according to a function Q(x,y)=ax + by and a, b are constant parameters, the update ratios ins, del and upd for insertions, deletions and updates respectively over the base relation R, Compute kcomp that is of the form kcomp = k + Δk Such that the view will containat least k tuples, k kcomp, with probability p, after a period T idQ k ΔkΔk kcomp V

8 M-Pref 2007, Vienna 23/9/2007 8 Related Work Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, Yuguo Chen: Efficient Maintenance of Materialized Top-k Views, ICDE 03 Maintain a materialized top-k view when updates occur in the base table Compute a k max (instead of the necessary k) adjusted at runtime so a refill query is rarely needed formulates the problem through a random walk model The method is theoretically guaranteed to work well only when the probabilities of insertions and deletions are equal, p ins =p del of insertions are more frequent than deletions p ins >p del There is no quality-of-service guarantee when deletions are more probable than insertions, p ins <p del

9 M-Pref 2007, Vienna 23/9/2007 9 Motivating Example idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q 22 8 44 17 Customers sign in and out Due to train departures, working hours At certain time periods, deletions are more probable than insertions p ins <p del The view will not contain at least k tuples nameQ Bill44 John22 Customers V

10 M-Pref 2007, Vienna 23/9/2007 10 Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

11 M-Pref 2007, Vienna 23/9/2007 11 Overview of the method 1. Compute the ratios of the incoming source updates that affect the view 2. Compute kcomp 3. Fine tune kcomp

12 M-Pref 2007, Vienna 23/9/2007 12 Empirical Cumulative Distribution Function ECDF ECDF is a non parametric cumulative distribution function that adapts itself to the data Definition F n (x) represents the proportion of observations in a sample less than or equal to x assigns the probability 1/n to each of n observations in the sample estimates the true population proportion F(x)

13 M-Pref 2007, Vienna 23/9/2007 13 Computation of update rates that affect V Given a relation Customers (id, name, age, salary, …) having N=4 tuples a materialized view V containing top-2 tuples (k=2) of the form (id, Q) where Q= -age +2*salary is the score Update ratios ins =1, del =2, upd =0 Find ins_aff and del_aff (insertions & deletions affecting the view) idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q 22 8 44 17 nameQ Bill44 John22 Customers V

14 M-Pref 2007, Vienna 23/9/2007 14 Computation of update rates that affect V Given N=4, ins =1, del =2, upd =0 We compute the following: updates are treated as a combination of deletions and insertions from ECDF the probability of a new tuple affecting the view Ratios affecting the view

15 M-Pref 2007, Vienna 23/9/2007 15 Overview of the method 1. Compute the ratios of the incoming source updates that affect the view 2. Compute kcomp 3. Fine tune kcomp

16 M-Pref 2007, Vienna 23/9/2007 16 Computation of kcomp Compute kcomp such that it will guarantee that the view will contain at least k tuples, k kcomp, with probability p, after a period of operation T that is of the form kcomp = k + Δk idQ ΔkΔk k kcomp idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 Q 22 8 44 17 nameQ Bill44 John22 Peter17 CustomersV

17 M-Pref 2007, Vienna 23/9/2007 17 Computation of kcomp idnameagesalary 1John1820 2Mary4225 3Bill2635 4Peter5737 5Kate2530 Q 22 8 44 17 25 nameQ Bill44 Kate25 John22 Peter17 There is 1 insertion and 2 deletions affecting the view Tuple (5, Kate, 25, 30) is inserted and Tuples (3, Bill, 26, 35) and (4, Peter, 57, 37) are deleted from the view The view will contain 2 tuples, as initially needed Customers V

18 M-Pref 2007, Vienna 23/9/2007 18 Overview of the method 1. Compute the ratios of the incoming source updates that affect the view 2. Compute kcomp 3. Fine tune kcomp

19 M-Pref 2007, Vienna 23/9/2007 19 Fine tune kcomp kcomp is expressed as a formula depending on ins_aff and del_aff the ratios of insertions and deletions affecting the view The probability of a tuple affecting the view may vary according to probabilistic properties Fine tune kcomp by adding the appropriate variance

20 M-Pref 2007, Vienna 23/9/2007 20 Fine tune kcomp The probability of a new tuple z affecting the view is p(z>valk) Bernoulli experiment with 2 possible events New tuple z affecting the view with probability p(z) New tuple z not-affecting the view with probability 1-p(z) The number of successes of ins Bernoulli experiments follow a Binomial distribution with VARIANCE : ins insertions in the base relation ins Bernoulli experiments

21 M-Pref 2007, Vienna 23/9/2007 21 Fine tune kcomp In worst case, in order to guarantee that the view will contain at least k tuples with confidence 95% kcomp is computed as: VAR ins denotes the variance of the insertions VAR del denotes the variance of the deletions

22 M-Pref 2007, Vienna 23/9/2007 22 Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

23 M-Pref 2007, Vienna 23/9/2007 23 Experimental methodology Test the following methods kcomp without fine tuning kcomp with fine tuning Yi et al @ ICDE03 For the following measures Number of tuples (# tuples) deleted from the view that fall below the threshold value of k Memory overhead for kcomp with & without fine tuning as number of extra tuples needed to keep in the view Number of extra tuples for kcomp with & without fine tuning compared to the number of extra tuples of the related work

24 M-Pref 2007, Vienna 23/9/2007 24 Experimental methodology Synthetic data sets: Gaussian distribution with mean μ=50 and variance σ=10 Negative exponential distribution with parameters a=1.0 for X and a=2.0 for Y Zipf distribution with parameter a=2.1 Size of source table R (tuples)|R|1x10 5, 5x10 5, 1x10 6, 2x10 6 Size of mat. View (tuples)k5, 10, 100, 1000 Size of update stream (pct over |R|) 1/1000, 1/100 Deletion rate over insertion rate (ratio) D/I1.0, 1.5, 2.0 Experimental parameters:

25 M-Pref 2007, Vienna 23/9/2007 25 Max & average misses kcomp without fine tuning Gaussian distribution As a function of R and As a function of k and D/I

26 M-Pref 2007, Vienna 23/9/2007 26 Memory overhead Number of extra tuples as a function of R and D/I

27 M-Pref 2007, Vienna 23/9/2007 27 Comparison with related work Number of extra tuples of kcomp with fine tuning compared with k max of the related work as a function of R

28 M-Pref 2007, Vienna 23/9/2007 28 Comparison with related work Number of extra tuples of kcomp with fine tuning compared with k max of the related work a s a function of k

29 M-Pref 2007, Vienna 23/9/2007 29 Contents Motivation & Problem Definition Overview of our Method Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions

30 M-Pref 2007, Vienna 23/9/2007 30 Conclusions We handled the problem of maintaining materialized top-k views in the presence of high deletion rates The method comprises the following steps: a computation of the rate that actually affects the materialized view, a computation of the necessary extension to k in order to handle the augmented number of deletions that occur and a fine tuning part that adjusts this value to take the fluctuation of the statistical properties of this value into consideration

31 M-Pref 2007, Vienna 23/9/2007 31 Thank you for your attention! … many thanks to our hosts! This research was co-funded by the European Union in the framework of the program Pythagoras IΙ of the Operational Program for Education and Initial Vocational Training of the 3rd Community Support Framework of the Hellenic Ministry of Education, funded by 25% from national sources and by 75% from the European Social Fund (ESF).

32 M-Pref 2007, Vienna 23/9/2007 32 Auxiliary slides Formulas for kcomp

33 M-Pref 2007, Vienna 23/9/2007 33 Time to build top-k view in microseconds NKGaussNegative exponential Zipf 100K5328000348500242000 100K10333000345667239667 100K100335500343000239667 100K1000395333406000299500 500K5165066717155001216333 500K10165066717130001208333 500K100165316717105001205667 500K1000173666717961671291833 1M5329866734290002427167 1M10330133334266672429667 1M100330400034395002422167 1M1000340316735205002606667 2M5665066769005005406333 2M10665316769008334909000 2M100674716769060004906500 2M1000689550070828334992167


Download ppt "Tuning the top-k view update process Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science."

Similar presentations


Ads by Google