1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

Dr. Miguel Bagajewicz Sanjay Kumar DuyQuang Nguyen Novel methods for Sensor Network Design.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Aggregating local image descriptors into compact codes
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
Fast Algorithms For Hierarchical Range Histogram Constructions
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
February 14, 2006CS DB Exploration 1 Congressional Samples for Approximate Answering of Group-By Queries Swarup Acharya Phillip B. Gibbons Viswanath.
Chapter 2: Lasso for linear models
Optimal Workload-Based Weighted Wavelet Synopsis
Visual Recognition Tutorial
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Processing Data-Stream Joins Using Skimmed Sketches Minos Garofalakis Internet Management Research Department Bell Labs, Lucent Technologies Joint work.
Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Based on the work of Jeffrey Scott Vitter and Min Wang.
Evaluating Hypotheses
Dynamic lot sizing and tool management in automated manufacturing systems M. Selim Aktürk, Siraceddin Önen presented by Zümbül Bulut.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Wavelet Synopses with Error Guarantees Minos Garofalakis Intel Research Berkeley
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Internet Management Research Dept. Bell Labs, Lucent
Chapter 14 Introduction to Linear Regression and Correlation Analysis
One-Pass Wavelet Decompositions of Data Streams TKDE May 2002 Anna C. Gilbert,Yannis Kotidis, S. Muthukrishanan, Martin J. Strauss Presented by James Chan.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Fast Approximate Wavelet Tracking on Streams Graham Cormode Minos Garofalakis Dimitris Sacharidis
AM Recitation 2/10/11.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Summarized by Soo-Jin Kim
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
 1  Outline  stages and topics in simulation  generation of random variates.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
CS433 Modeling and Simulation Lecture 16 Output Analysis Large-Sample Estimation Theory Dr. Anis Koubâa 30 May 2009 Al-Imam Mohammad Ibn Saud University.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Join Synopses for Approximate Query Answering Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Wavelet Synopses with Predefined Error Bounds: Windfalls of Duality Panagiotis Karras DB seminar, 23 March, 2006.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
The Haar + Tree: A Refined Synopsis Data Structure Panagiotis Karras HKU, September 7 th, 2006.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Join Synopses for Approximate Query Answering Swarup Acharya, Philip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy By Vladimir Gamaley.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
Analysis of Experimental Data; Introduction
One-Pass Wavelet Synopses for Maximum-Error Metrics Panagiotis Karras Trondheim, August 31st, 2005 Research at HKU with Nikos Mamoulis.
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Chapter 16 Multiple Regression and Correlation
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Dense-Region Based Compact Data Cube
Confidence Intervals and Sample Size
Deep Feedforward Networks
Elementary Statistics
A paper on Join Synopses for Approximate Query Answering
Probabilistic Data Management
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Optimization under Uncertainty
Presentation transcript:

1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray Hill, NJ ACM SIGMOD 2002

2 Outline Introduction Wavelet basics Probabilistic wavelet synopses Experimental study Conclusions

3 Introduction The wavelet decomposition has demonstrated the effectiveness in reducing large amounts of data to compact sets of wavelet coefficients (termed “ wavelet synopses ” ) that can be used to provide fast and reasonably accurate approximate answers to queries. Due to exploratory nature of many Decision Support Systems applications, there are a number of scenarios in which the user may prefer a fast, approximate answer.

4 Introduction A major criticism of wavelet-based techniques is the fact that conventional wavelet synopses can not provide guarantees on the error of individual approximate query answers.

5 Introduction The problem for approximate query processing with wavelet synopses, due to their deterministic approach to selecting coefficients and their lack of error guarantees. We propose a approach to building wavelet synopses that enables unbiased approximate query answers with error guarantees on the accuracy of individual answers.

6 Introduction The technique is based on probabilistic thresholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis.

7 Wavelet basics Given the data vector A, the wavelet transform of A can be computed as follow: In order equalize the importance of all wavelet coefficients we normalize the coefficient,, is

8 Wavelet basics A helpful tool for exploring and understanding the key properties of the wavelet decomposition is error tree structure.

9 Wavelet basics The important reconstruction properties: (P1)The reconstruction of any data value d i depends on the values of the nodes in path( d i ) (P2)The range sum d(l:h)=

10 Wavelet basics d 5 = c 0 - c 2 + c 5 - c 10 =65-14+(-20)-28=3 d(3:5)= 3 c 0 +(1-2) c 2 - c 4 +2 c 5 - c 9 +(1-1) c 10 =93

11 Probabilistic wavelet synopses A.The problem with conventional wavelets Conventional coefficient thresholding is a completely deterministic process that typically retain the B wavelet coefficients with the largest absolute value after normalization, this deterministic process minimizes the overall L 2 error.

12 Probabilistic wavelet synopses A.The problem with conventional wavelets d 5 = =65, d(3:5)= 3* =195

13 Probabilistic wavelet synopses A.The problem with conventional wavelets Root causes: (1)strict deterministic thresholding (2)independent thresholding (3)the bias resulting from dropping coefficients without compensating for their loss

14 Probabilistic wavelet synopses B.General Approach Our scheme deterministically retains the most important coefficients while randomly rounding the other coefficients either up to a larger value( rounding value) or down to zero. By carefully selecting the rounding values, we ensure that (1)We expect a total of B coefficients to be retained (2)We minimize a desired error metric in the reconstruction of the data

15 Probabilistic wavelet synopses B.General Approach The key idea in thresholding scheme is to associate a random variable C i such that (1) C i =0 with some probability (2)E[ C i ] = c i where we select a rounding value, λ i, for each non- zero c i such that

16 Probabilistic wavelet synopses B.General Approach Our thresholding scheme essentially “ rounds ” each non-zero wavelet coefficient c i independently to either λ i or zero by flipping a biased coin with success probability It variance is simply

17 Probabilistic wavelet synopses B.General Approach 1 For example, λ 0 =c 0, λ 10 = 2c 10, λ i =3c i /2

18 Probabilistic wavelet synopses B.General Approach The impact of the λ i ’ s λ i closer c i reduce the variance λ i further from c i reduces the expected number of retained coefficients

19 Probabilistic wavelet synopses C. Rounding to minimize the expected mean-square error A reasonable approach is to select the λ i values in a way that minimize the some overall error metric (e.g.L 2 ). 1

20 Probabilistic wavelet synopses C. Rounding to minimize the expected mean-square error Letting and The expected L 2 error minimization problem is equivalent to Based on the Cauchy-Schwarz inequality, the minimum value of the objective is reached when

21 Probabilistic wavelet synopses C. Rounding to minimize the expected mean-square error Let

22 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error We focus on minimizing the maximum reconstruction error for individual (related error). The goal is to produce estimate for each value d i such that

23 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error The expected value of, we would like to minimize the variance. More precisely, we seek to minimize the normalized standard error for a reconstructed data value

24 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error Note that by applying Chebyshev ’ s Inequality, we obtain( for all α>1) So that minimizing NSE will indeed minimize the probabilistic bounds on relative error metric.

25 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error

26 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error We would like to formulate a dynamic programming recurrence for this problem. Let PATHS j denote the set of all root-to-leaf pahts in T j, M[ j,B] denote the optimal value of the maximum among all data d k in T j assuming a space budget of B.

27 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error M[ j,B] depicted in (11)

28 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error

29 Probabilistic wavelet synopses D.Rounding to minimize the maximum relative error The problem in (11) is that the y i and b L each range over a continuous interval, making it infeasible to use. The key technical idea is to quantize the solution space. We modify the constraint where q is a input integer

30 Probabilistic wavelet synopses E. Low-bias probabilistic wavelet synopses Each coefficient is either retained or discarded, according to the probabilities y i, where as before the y i ’ s are selected to minimize a desired error metric.

31 Probabilistic wavelet synopses F. Summary of the approach

32 Experimental study A Zipfian data generator was used to produce Zipfian frequencies for various levels of skew (z parameter between 0.3 to 2.0). We use real world data set download from the National Forest Service. Let q=10, sanity bound S as the 10-percentile in the data, perturbation Δ= min{0.01, S/100}

33 Experimental study

34 Experimental study

35 Experimental study

36 Conclusions We has introduced probabilistic wavelet synopses, the first wavelet-based data reduction technique that provably enables unbiased data reconstruction, with error guarantees on individual approximate answers. We have described a number of novel techniques for tuning our scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets demonstrate that probabilistic wavelet synopses significantly reduce relative error compared with the deterministic approach.