Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Similar presentations


Presentation on theme: "Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY."— Presentation transcript:

1 Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY

2 Star Schema Vs. Multi-dimensional Range Queries SUM (qty * amt) WHERE ProdId in [p1.. p10] AND custId < 200

3 Characteristics of Multi-Dimensional Range Queries in Data Warehouse Ad-Hoc  Give N dimensions (attributes), every combination is possible: 2 N combinations  A Data Cube equals to 2 N GROUP-Bys High Dimensions ( > 20) Large Number of Records

4 Multi-Dimensional Index Fails! R-Trees or KD-Trees  Effective only for moderate number of dimensions  Efficient only for queries involving all indexed dimensions. For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.

5 Projection Index Fix the order of the records in the base table  Store Project records along some dimension  i.e, A single Column  Keeping the record order  Keeping the duplicates Like “array” in C language base table Projection Index

6 Multi-dimensional Range Queries : A General Idea Build an index for each dimension (attribute);  A Projection Index  A B-Tree 1 Primary B-Tree, N -1 Secondary B-Trees For each involved dimension, use the index on that dimension to select records; “AND” the records to get the final answer set.

7 How to make the “ AND ” operation fast? Projection Index (B-Tree is similar)  Scan each involved dimension,  And return a set of RIDs.  Intersection the RID sets Sets have different lengths We can use Sort and Merge to do the Intersection  Life is easier when all the sets have the same length and in the same order Use 1/0 to record the membership of each record

8 General Ideas of Bitmap Index Fix the order of records in the base table Suppose the base table has m records For each dimension  For each distinct dimension value (as the KEY)  Build a bitmap with m bits (as the POSITIONS)  A bitmap is like an Inverted Index “AND”, “OR” operations  realized by bitwise logical operations  Well supported by hardware

9 Basic Bitmap Index P. O ’ Neil, Model 204,1987

10 Size of Bitmap Indices Number of Bitmap (Indices)  How to build bitmap indices for dimensions with large distinct values  Temperature dimension Size (i.e., Length) of a Single Bitmap

11 Three Solutions Encoding  Reduce the Number of Bitmaps Binning  Reduce the Number of Bitmaps Compression  Reduce the Size of a Single Bitmap

12 Encoding Strategies Equality-encoded  Good for equality queries , such as “ temperature == 100 ”  Basic Bitmap Index Bit-sliced index  Assume dimension A has c distinct values, use log 2 c bitmap indices to represent each record (its value) Range-encoded  Good for one-sided range queries, such as “ Pressure < 56.7 ” Interval-encoded  Good for two-sided range queries, such as “ 35.8 < Pressure < 56.7 ”

13

14 Binning Encoding mainly considers discrete dimension values  Usually integers Basic Ideas of Binning  Build a bitmap index for a bin instead of for a distinct value  The Number of Bitmaps has nothing to do with the number of distinct values in a dimension. Pros and Cons  Pros : control the number of bitmap via controling the number of bins.  Cons : need to check original dimension values to decide if the records really satisfy query conditions.

15 A Binning Example: Values of Dimension A lie in [0, 100]

16 Compression Strategies General-purpose compression methods  Software packages are widely available  Tradeoff between query processing and compression ratio De-compress data first Specific methods  BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996. Adopted since Oracle 7.3  WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006. Used in Lawrence Berkeley Lab for high-energy physics

17 WAH(Word-aligned Hybrid Bitmap code ) Based on run-length encoding  For consecutive 0s or 1s in a bit sequence (part of a bitmap) Use machine WORD as the unit for compression  Instead of BYTE in BBC Design Goal :  reduce the overhead of de-compression, in order to speed- up query response.

18 Run-length encoding Bit sequence B : 11111111110001110000111111110001001 fill : a set of consecutive identical bits (all 0s or all 1s)  The first 10 bits in B  fill = count “ + ” bit value  1111111111=10 “ + ” 1 tail: a set of mixed 0s and 1s  The last 8 bits in B Run :  Run = fill + tail Basic Ideas of WAH  Define fill and tail appropriately so that they can be stored in WORDs.

19 Word-aligned Hybrid Bitmap code: 32-bit WORD

20 Characteristics of Industrial Products Model 204. (Pat O ’ Neil,1987)  The first that adopted bitmap index  Basic Bitmap Index, No binning, No compression  Now owned by Computer Corporation of America Oracle ( 1995 )  Adopted compressed bitmap index since 7.3  Probably use BBC for compression, Equality-encoded, No binning. Sybase IQ  bit-sliced index(Pat O ’ Neil et al,1997)  No binning, No compression  For dimension with small number of distinct values, use Basic Bitmap Index.

21 References Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.Bitmap Indices for Data WarehousesWrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions


Download ppt "Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY."

Similar presentations


Ads by Google