Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Fast Algorithms For Hierarchical Range Histogram Constructions
University of Ioannina - Department of Computer Science Wavelets and Multiresolution Processing (Background) Christophoros Nikou Digital.
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat.
Optimal Workload-Based Weighted Wavelet Synopsis
Linear Algebraic Equations
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Lecture05 Transform Coding.
Approximate Range Searching in the Absolute Error Model Guilherme D. da Fonseca CAPES BEX Advisor: David M. Mount.
Matlab Matlab is a powerful mathematical tool and this tutorial is intended to be an introduction to some of the functions that you might find useful.
Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Based on the work of Jeffrey Scott Vitter and Min Wang.
Real-time Hand Pose Recognition Using Low- Resolution Depth Images
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
1 Computer Science 631 Lecture 4: Wavelets Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Dependency-Based Histogram Synopses for High-dimensional Data Amol Deshpande, UC Berkeley Minos Garofalakis, Bell Labs Rajeev Rastogi, Bell Labs.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms (Wavelet) Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Fast multiresolution image querying CS474/674 – Prof. Bebis.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Internet Management Research Dept. Bell Labs, Lucent
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Ashish Uthama EOS 513 Term Paper Presentation Ashish Uthama Biomedical Signal and Image Computing Lab Department of Electrical.
Fast Approximate Wavelet Tracking on Streams Graham Cormode Minos Garofalakis Dimitris Sacharidis
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Wavelet Transforms CENG 5931 GNU RADIO INSTRUCTOR: Dr GEORGE COLLINS.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Video Mosaics AllisonW. Klein Tyler Grant Adam Finkelstein Michael F. Cohen.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Wavelet Synopses with Predefined Error Bounds: Windfalls of Duality Panagiotis Karras DB seminar, 23 March, 2006.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
Multiresolution analysis and wavelet bases Outline : Multiresolution analysis The scaling function and scaling equation Orthogonal wavelets Biorthogonal.
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 final project
The Haar + Tree: A Refined Synopsis Data Structure Panagiotis Karras HKU, September 7 th, 2006.
The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis.
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Vector Quantization CAP5015 Fall 2005.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Wavelets Chapter 7 Serkan ERGUN. 1.Introduction Wavelets are mathematical tools for hierarchically decomposing functions. Regardless of whether the function.
Wavelets (Chapter 7) CS474/674 – Prof. Bebis. STFT - revisited Time - Frequency localization depends on window size. –Wide window  good frequency localization,
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.
Dense-Region Based Compact Data Cube
Singular Value Decomposition and its applications
Data Transformation: Normalization
Compressive Coded Aperture Video Reconstruction
Multi-resolution image processing & Wavelet
Parallel Databases.
Computing and Compressive Sensing in Wireless Sensor Networks
A paper on Join Synopses for Approximate Query Answering
Wavelets : Introduction and Examples
Spatial Indexing I Point Access Methods.
Roadmap to Programming work, right, fast KISS
Spatial Online Sampling and Aggregation
K Nearest Neighbor Classification
SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS
Efficient Aggregation over Objects with Extent
Presentation transcript:

Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra

Outline

Introduction o Approximate Query Processing is a viable solution for:  Huge amounts of data  High query complexities  Stringent response-time requirements o Decision Support Systems  Support business and organizational decision-making activities  Helps decision makers compile useful information from raw data, solve problems and make decisions

Introduction… o DSS users pose very complex queries to the DBMS  Requires complex operations over GB or TBs of disk- resident data  Very long time to execute and produce exact answers  Number of scenarios where users prefer a fast, approximate answers

Prior Work o Previous Approximate query processing techniques  Focused on specific forms of aggregate queries  Data reduction mechanism – how to obtain the synopses of data o Sampling-based Techniques  A join-operator on 2 uniform random samples results in a non-uniform sample having very few tuples  For non-aggregate queries, it produces a small subset of the exact answer which might be empty when joins are involved.

Prior Work… o Histogram Based Techniques  Problematic for high-dimensional data  Storage overhead  High construction cost o Wavelet Based Techniques  Mathematical tool for hierarchical decomposition of functions  Apply wavelet decomposition to input data collection –> data synopsis  Avoids high construction costs and storage overhead

Contribution of the Paper o Viability and effectiveness of wavelets as a generic tool for high-dimensional DSS o New, I/O-efficient wavelet decomposition algorithm for relational tables o Novel Query processing algebra for Wavelet-Co- Efficient Data Synopses o Extensive Experiments

Background o Mathematical tool to hierarchically decompose functions o Coarse overall approximation together with detail coefficients that influence function at various scales o Haar wavelets are conceptually simple, fast to compute o Variety of applications like image editing and querying

One-Dimensional Haar Wavelets o How to compute, given a data array:  Average the values together pairwise to get a “lower- resolution” representation of data  Detailed coefficients-> differences of the averages from the computed pairwise average  Reconstruction of the data array possible  Why Detail Coefficients

One-dimensional Haar Wavelets o Wavelet Transform: Overall average followed by detail coefficients in increasing order of resolution. Each entry->wavelet coefficient o W A = [4, -2, 0, -1] o For vectors containing similar values,  most detail coefficients have small values that can be eliminated  Introduces only small errors

One-dimensional Haar Wavelets o Overall average more important than any detail coefficient o To normalize the final entries of W A, each wavelet coefficient is divided by  2 l  l: level of resolution  W A = [4, -2, 0, -1/  2]

Multi-dimensional Haar Wavelets o Haar wavelets can be extended to multi-dimensional array  Standard Decomposition Fix an ordering for the data dimensions(1,2,…d) Apply complete 1-D wavelet transform for each 1-d row of array cells along dimension k  Nonstandard Decomposition Alternates between dimensions during successive steps of pairwise averaging and differencing for each 1-D row of array cells along dimension k Repeated recursively on quadrant containing all averages across all dimensions

Non-standard Decomposition  Pairwise averaging and differencing for one positioning of 2x2 box with root [2i 1, 2i 2 ]  Distribution of the results in the wavelet transform array  Process is recursed on lower-left quadrant of W A

Example Decomposition of a 4 X 4 Array

Multi-dimensional Haar coefficients: Semantics and Representation o D-dimensional Haar basis function corresponding to w is defined by:  D-dimensional rectangular support region  Quadrant sign information

Support Regions for 16 Nonstandard 2-D Haar Basis Function  Blank areas – regions of A whose reconstruction is independent of the coefficient  WA[0,0] – overall average  WA[3,3] – contributes only to upper right quadrant

Haar CoEfficients: Semantics and Representation o W =  W.R – d-dimensional support hyper-rectangle of W encloses all cells in A to which W contributes Hyper-rectangle – represented by low and high boundaries across each dimension j, 1<= j <=d W.R.boundary[j].lo and W.R.boundary[j].hi W contributes to each data cell A[i1,……id] where W.R.boundary[j].lo <= ij <= W.R.boundary[j].hi for all j

o W.S – sign infromation for all d-dimensional quadrants of W.R  Denoted by W.S.sign[j].lo and W.S.sign[j].hi corresponding to lower and upper half of W.R’s extent along j  Computed as the product of d sign-vector entries that map to that quadrant o W.v – scalar magnitude of W  Quantity that W contributes to all data array cells enclosed in W.R

Building Wavelet Coefficient Synopses o Relation R with d attributes X 1, X 2, ………X d o Can represent R as a d-dimensional array A R o J th dimension is indexed by the values of attribute X j o Cells contain the count of tuples in R having the corresponding combination of attribute values o A R – joint frequency distribution of all attributes of R

Chunk-based organization of relational tables Joint frequency array AR – split into d-dimensional chunks Tuples of R of same chunk are stored contiguously on disk If R is not chunked, one extra pre-processing step to reorganize R on disk

ComputeWavelet Algorithm When a chunk is loaded for the first time, ComputeWavelet can perform entire computation for decomposing Pairwise averaging and differencing is performed as soon as 2 d averages are accumulated Memory efficient- no more than one active sub-array at a time for each level of resolution

Processing Relational Queries in Wavelet Coefficient Domain Wavelet-Coefficient Synopses W T1, W T2,…W Tk RS of Wavelet Coefficients W S Approx. Result Relation S Wavelet-Coefficient Synopses W T1, W T2,…W Tk Approximate Relations T1, T2,….Tk Approx. Result Relation S Op(W T1,….W Tk ) Render(W S ) Render(WT1…WTk) Op(T1, T2…. Tk)

Selection Operator Our selection operator has the general form select pred (W T ), where pred represents a generic conjunctive predicate on a subset of the d attributes in T; that is, pred = (l i1 ≤ X i1 ≤ h i1 ) ∧... ∧ (l ik ≤ X ik ≤ h ik ), where l ij and h ij denote the low and high boundaries of the selected range along each selection dimension D ij, j = 1, 2, · · ·, k, k ≤ d.

Selection - Relational Domain o In relational domain, interested in only those cells inside query range o In wavelet domain, interested in only the coefficients that contribute to those cells Dim. D Query Range Dim. D1 Joint Data Distribution Array Relation

Projection Operator

Projection- Wavelet Domain

Join Operator

Join Operator- Wavelet Domain

Experimental Study o Improved answer quality o Low synopsis construction costs o Fast query execution

Query Execution Times

SELECT-JOIN-SUM

SELECT Query errors on real-life data

Conclusion o Multidimensional wavelets as an effective tool for general purpose approximate query processing in modern, high dimensional applications o The query processing algorithms operate directly on the wavelet-coefficient synopses of relational data, thus allowing for very fast processing of arbitrarily complex queries entirely in the wavelet-coefficient domain o Extensive experimental study with synthetic as well as real-life data sets that verifies the effectiveness of the wavelet-based approach compared to both sampling and histograms

Thank you