Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.

Similar presentations


Presentation on theme: "1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray."— Presentation transcript:

1 1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray Hill, NJ 07974 ACM SIGMOD 2002

2 2 Outline Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions

3 3 Introduction The wavelet decomposition has demonstrated the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed “ wavelet synopses ” ) that can be used to provide fast and reasonably accurate approximate answers to queries. Due to exploratory nature of many Decision Support Systems applications, there are a number of scenarios in which the user may prefer a fast, approximate answer.

4 4 Introduction A major criticism of wavelet-based techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers.

5 5 Introduction The problem for approximate query processing with wavelet synopses, due to their deterministic approach to selecting coefficients and their lack of error guarantees. We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers.

6 6 Introduction The technique is based on probabilistic thresholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis.

7 7 Wavelet basics Given the data vector A, the wavelet transform of A can be computed as follow: In order equalize the importance of all wavelet coefficients we normalize the coefficient,, is

8 8 Wavelet basics A helpful tool for exploring and understanding the key properties of the wavelet decomposition is error tree structure.

9 9 Wavelet basics The important reconstruction properties: (P1)The reconstruction of any data value d i depends on the values of the nodes in path( d i ) (P2)The range sum d(l:h)=

10 10 Wavelet basics d 5 = c 0 - c 2 + c 5 - c 10 =65-14+(-20)-28=3 d(3:5)= 3 c 0 +(1-2) c 2 - c 4 +2 c 5 - c 9 +(1-1) c 10 =93

11 11 Probabilistic wavelet synopses A.The problem with conventional wavelets Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization, this deterministic process minimizes the overall L 2 error.

12 12 Probabilistic wavelet synopses A.The problem with conventional wavelets d 5 =65-0+0-0=65, d(3:5)= 3*65-0-0+0-0=195

13 13 Probabilistic wavelet synopses A.The problem with conventional wavelets Root causes: (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coefficients without compensating for their loss

14 14 Probabilistic wavelet synopses B.General Approach Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero. By carefully selecting the rounding values, we ensure that (1)We expect a total of B coefficients to be retained (2)We minimize a desired error metric in the reconstruction of the data

15 15 Probabilistic wavelet synopses B.General Approach The key idea in thresholding scheme is to associate a random variable C i such that (1) C i =0 with some probability (2)E[ C i ] = c i where we select a rounding value, λ i, for each non- zero c i such that

16 16 Probabilistic wavelet synopses B.General Approach Our thresholding scheme essentially “ rounds ” each non-zero wavelet coefficient c i independently to either λ i or zero by flipping a biased coin with success probability It variance is simply

17 17 Probabilistic wavelet synopses B.General Approach 1 For example, λ 0 =c 0, λ 10 = 2c 10, λ i =3c i /2

18 18 Probabilistic wavelet synopses B.General Approach The impact of the λ i ’ s λ i closer c i reduce the variance λ i further from c i reduces the expected number of retained coefficients

19 19 Probabilistic wavelet synopses C. Rounding to minimize the expected mean-square error A reasonable approach is to select the λ i values in a way that minimize the some overall error metric (e.g.L 2 ). 1

20 20 Probabilistic wavelet synopses C. Rounding to minimize the expected mean-square error Letting and The expected L 2 error minimization problem is equivalent to Based on the Cauchy-Schwarz inequality, the minimum value of the objective is reached when

21 21 Probabilistic wavelet synopses C. Rounding to minimize the expected mean-square error Let

22 22 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error We focus on minimizing the maximum reconstruction error for individual (related error). The goal is to produce estimate for each value d i such that

23 23 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error The expected value of, we would like to minimize the variance. More precisely, we seek to minimize the normalized standard error for a reconstructed data value

24 24 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error Note that by applying Chebyshev ’ s Inequality, we obtain( for all α>1) So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric.

25 25 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error

26 26 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error We would like to formulate a dynamic programming recurrence for this problem. Let PATHS j denote the set of all root-to-leaf pahts in T j, M[ j,B] denote the optimal value of the maximum among all data d k in T j assuming a space budget of B.

27 27 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error M[ j,B] depicted in (11)

28 28 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error

29 29 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error The problem in (11) is that the y i and b L each range over a continuous interval, making it infeasible to use. The key technical idea is to quantize the solution space. We modify the constraint where q is a input integer

30 30 Probabilistic wavelet synopses E. Low-bias probabilistic wavelet synopses Each coefficient is either retained or discarded, according to the probabilities y i, where as before the y i ’ s are selected to minimize a desired error metric.

31 31 Probabilistic wavelet synopses F. Summary of the approach

32 32 Experimental study A Zipfian data generator was used to produce Zipfian frequencies for various levels of skew (z parameter between 0.3 to 2.0). We use real world data set download from the National Forest Service. Let q=10, sanity bound S as the 10-percentile in the data, perturbation Δ= min{0.01, S/100}

33 33 Experimental study

34 34 Experimental study

35 35 Experimental study

36 36 Conclusions We has introduced probabilistic wavelet synopses, the first wavelet-based data reduction technique that provably enables unbiased data reconstruction, with error guarantees on individual approximate answers. We have described a number of novel techniques for tuning our scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach.


Download ppt "1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray."

Similar presentations


Ads by Google