Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala.

Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala (Bell Laboratories) Presented by: Amrita Tamrakar CSE 6392 09-feb-2006

Introduction What is a histogram? Issues in Histogram maintenance Novel concept of “Backing sample” Types of approximate histograms Incremental maintenance of approx histograms Challenges and solutions Conclusion

What is a histogram? maintained to approximate the distribution of data in the attributes constructed by partitioning the data into mutually disjoint subsets Frequency as y axis and the data intervals as x axis Oracle, DB2, SQLserver, Sybase, Informix… http://www.shodor.org/interactivate/activities/histogram/ Data value interval FrequencyFrequency Commercial Vendors Histograms IBM DB2 Compressed (V,F) Oracle Equi-depth Sqlserver Equi-depth Sybase Equi-depth

History of Histogram Equi-width histogram Compressed histogram Learn more on Histogram

precomputed on underlying data Stored in main memory, less overhead What about the maintenance ??  Database is modified  Query is changed(?)  Outdated histogram  Does periodic updates solve the problem? Recomputing from the scratch Poor estimation during the in-between period What’s the solution ? Issues on Histogram Maintenance

The solution to outdated histograms Maintain Approximate histogram in presence of database updates Split and merge technique for quick adjustment “Backing sample” stored in secondary mm

Backing Sample Only row id and the necessary attributes At any time, backing sample = random sample No entire table scan Records in Consecutive disk blocks Histogram Relation (20GB) Backing sample (100KB) 2 KB Main memory

During insertions  Reservoir sampling technique  Obtain sample of data from a single scan without a priori knowledge of no of tuples.  Length of random skip chosen such that each tuple is likely to be in the reservoir. 1 2 n First n n+1 Skip random no of record and replace How to maintain a backing sample? MaintainBackingSample

During modification  Modify if tuple present in sample During Deletion  Remove from the sample  If sample size decrease below lower bound L, then recompute from disk. How to maintain a backing sample?

Maintain approximate Histograms : Different Classes of Histograms Equidepth histograms  No. of tuples in each bucket is same  Contiguous ranges of attribute values Data value Frequency of occurrence

Compressed (V,F) histogram  N highest frequencies stored in singleton buckets  For other values, use equi-depth histogram Both histograms needs to store for each bucket  The largest value in the bucket B.maxval  The Count B.count Approximate histograms are calculated from the random sample of the Relation How to maintain these histograms? Different Classes of Histograms

Fast Incremental maintenance of approximate equi-depth histograms During Insertion  Maintain a threshold (T) upper bound  If no of tuples < T, insertion will increment the bucket count.  Else recompute the histogram Split and merge algorithm  Reduce the no. of recomputations from the sample  When bucket count reaches T, instead of recomputing split the bucket in half.  But maintain the number of bucket as fixed by merging two buckets whose total count<T

Split n merge algorithm Insert threshold

To handle modify and delete Deletion can lower the bucket count Maintain a T l as lower threshold Merge if below threshold Split bucket with largest count Delete threshold

Fast Incremental maintenance of approximate compressed histograms Values with high frequencies can span more than one bucket – replace by single bucket with single count –singleton buckets Construct compressed histogram on the sample and scale it by N/k factor. During insertions  If the count doesn’t exceed threshold, add to the bucket, else update bucket boundaries

Challenges to maintain compressed histograms New values may lead to data skew, which may lead to new singleton buckets Values may not belong to singleton buckets if tuples increase in equi-depth buckets Number of equi-depth buckets needs adjustment No. of tuples in equi-depth buckets needs adjustment

Solutions to the challenges Large number of same value will cause an equi-depth bucket to split but the adjacent boundaries will have same value, hint create singleton bucket for that value allow singleton buckets with small counts to be merged back into equi-depth buckets. Split and merge technique to control imbalance between equi-depth buckets and their tuples without recomputation

To handle deletion and modification Deletion can decrease number of tuples in a bucket relative to another bucket, making a singleton bucket can drop a bucket count to the lower threshold TL. What to do?  Merge the pair with smallest combined count and split the bucket with largest count  Else recompute from backing sample

Conclusion Backing sample Incremental maintenance of equi-depth and compressed histograms Split and merge technique to reduce access to backing sample

Use of histograms in Commercial database Commercial Vendors Histograms IBM DB2 Compressed (V,F) -SASH (Self Adaptive Set of histograms) Research at Watson -Two phase of automatically building/maintaining histograms based on query feedback http://www.research.ibm.com/scalabledb/projects.html http://www.cs.uwaterloo.ca/~ashraf/pubs/vldb04autostats.pdf http://www.research.ibm.com/people/l/lipyeow/publications/phdthesis.pdfhttp://www.research.ibm.com/people/l/lipyeow/publications/phdthesis.pdf chap-3 Oracle Equi-depth -Oracle optimizer decide whether to use index vs full-table scan -use of dbms_stats, ANALYZE -Oracle 10g claims to generate histograms automatically when appropriate http://www.dba-oracle.com/t_histograms.htm http://www.mcse.ms/archive26-2005-5-1624465.html SqlServer Equi-depth - a query processor can make more accurate cardinality estimates http://www.sql-server-performance.com/nb_execution_plan_statistics.asp http://windowssdk.msdn.microsoft.com/library/default.asp?url=/library/en- us/oledb/htm/oledbrowsets_special_purpose_rowsets.asp

Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala.

Similar presentations

Presentation on theme: "Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala.

Similar presentations

Presentation on theme: "Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala."— Presentation transcript:

Similar presentations

About project

Feedback