Presentation is loading. Please wait.

Presentation is loading. Please wait.

ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.

Similar presentations


Presentation on theme: "ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of."— Presentation transcript:

1 ITCS 6163 Lecture 5

2 Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of indexed keys. Dynamic, stable and exhibit good performance under updates. (But OLAP is not about updates….) Bitmaps: Space efficient Difficult to update (but we don’t care in DW). Can effectively prune searches before looking at data.

3 Bitmaps R = (…., A,….., M)  R (A) B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B 0 3 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 1 0 0 8 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 7 0 1 0 0 0 0 0 0 0 5 0 0 0 1 0 0 0 0 0 6 0 0 1 0 0 0 0 0 0 4 0 0 0 0 1 0 0 0 0

4 Query optimization Consider a high-selectivity-factor query with predicates on two attributes. Query optimizer: builds plans (P1) Full relation scan (filter as you go). (P2) Index scan on the predicate with lower selectivity factor, followed by temporary relation scan, to filter out non- qualifying tuples, using the other predicate. (Works well if data is clustered on the first index key). (P3) Index scan for each predicate (separately), followed by merge of RID.

5 Query optimization (continued) (P2) Blocks of data Pred. 2 answer t1 tn Index Pred1 (P3) t1 tn Index Pred2 Tuple list1 Tuple list2 Merged list

6 Query optimization (continued) When using bitmap indexes (P3) can be an easy winner! CPU operations in bitmaps (AND, OR, XOR, etc.) are more efficient than regular RID merges: just apply the binary operations to the bitmaps (In B-trees, you would have to scan the two lists and select tuples in both -- merge operation--) Of course, you can build B-trees on the compound key, but we would need one for every compound predicate (exponential number of trees…).

7 Bitmaps and predicates A = a1 AND B = b2 Bitmap for a1Bitmap for b2 AND = Bitmap for a1 and b2

8 Tradeoffs Dimension cardinality small dense bitmaps Dimension cardinality large sparse bitmaps Compression (decompression)

9 Bitmap for prod  Bitmap for prod  ….. Query strategy for Star joins Maintain join indexes between fact table and dimension tables Prod. Fact tableDimension table a... k …… …… Bitmap for type a Bitmap for type k ….. Bitmap for loc.  Bitmap for loc.  …..

10 Strategy example Aggregate all sales for products of location ,  or Bitmap for  Bitmap for  Bitmap for OR = Bitmap for predicate

11 Star-Joins Select F.S, D1.A1, D2.A2, …. Dn.An from F,D1,D2,Dn where F.A1 = D1.A1 F.A2 = D2.A2 … F.An = Dn.An and D1.B1 = ‘c1’ D2.B2 = ‘p2’ …. Likely strategy: For each Di find suitable values of Ai such that Di.Bi = ‘xi’ (unless you have a bitmap index for Bi). Use bitmap index on Ai’ values to form a bitmap for related rows of F (OR-ing the bitmaps). At this stage, you have n such bitmaps, the result can be found AND-ing them.

12 Example Selectivity/predicate = 0.01 (predicates on the dimension tables) n predicates (statistically independent) Total selectivity = 10 -2n Facts table = 10 8 rows, n = 3, tuples in answer = 10 8 / 10 6 = 100 rows. In the worst case = 100 blocks… Still better than all the blocks in the relation (e.g., assuming 100 tuples/block, this would be 10 6 blocks!)

13 Design Space of Bitmap Indexes The basic bitmap design is called Value-list index. The focus there is on the columns. If we change the focus to the rows, the index becomes a set of attribute values (integers) in each tuple (row), that can be represented in a particular way. 50 0 0 1 0 0 0 0 0 We can encode this row in many ways...

14 Attribute value decomposition C = attribute cardinality Consider a value of the attribute, v, and a sequence of numbers. Also, define b n =  C /  b i , then v can be decomposed into a sequence of n digits as follows: v = V 1 = V 2 b 1 + v 1 = V 3 (b 2 b 1 ) + v 2 b 1 + v 1 … n-1 i-1 = v n (  b j ) + …+ v i (  b j ) + …+ v 2 b 1 + v 1 where v i = V i mod b i and V i =  V i-1 /b i-1 

15 (decimal system!) 576 = 5 x 10 x 10 + 7 x 10 + 6 576/100 = 5 | 76 76/10 = 7 | 6 6 Number systems How do you write 576 in: 576 = 1 x 2 9 + 0 x 2 8 + 0 x 2 7 + 1 x 2 6 + 0 x 2 5 + 0 x 2 4 + 0 x 2 3 + 0 x 2 2 + 0 x 2 1 + 0 x 2 0 576/ 2 9 = 1 | 64, 64/ 2 8 = 0|64, 64/ 2 7 = 0|64, 64/ 2 6 = 1|0, 0/ 2 5 = 0|0, 0/ 2 4 = 0|0, 0/ 2 3 = 0|0, 0/ 2 2 = 0|0, 0/ 2 1 = 0|0, 0/ 2 0 = 0|0 576/(7x7x5x3) = 576/735 = 0 | 576, 576/(7x5x3)=576/105=5|51 576 = 5 x (7x5x3)+51 51/(5x3) = 51/15 = 3 | 6 576 = 5 x (7x5x3) + 3 (5 x 3) + 16 6/3 =2 | 0 576 = 5 x (7x 5 x 3) + 3 x (5 x 3 ) + 2 x (3)

16 Bitmaps R = (…., A,….., M) value-list index  R (A) B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B 0 3 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 1 0 0 8 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 7 0 1 0 0 0 0 0 0 0 5 0 0 0 1 0 0 0 0 0 6 0 0 1 0 0 0 0 0 0 4 0 0 0 0 1 0 0 0 0

17 Example sequence value-list index (equality)  R (A) B 2 2 B 1 2 B 0 2 B 2 1 B 1 1 B 0 1 3 (1x3+0) 0 1 0 0 0 1 2 0 0 1 1 0 0 1 0 0 1 0 1 0 2 0 0 1 1 0 0 8 1 0 0 1 0 0 2 0 0 1 1 0 0 2 0 0 1 1 0 0 0 0 0 1 0 0 1 7 1 0 0 0 1 0 5 0 1 0 1 0 0 6 1 0 0 0 0 1 4 0 1 0 0 1 0

18 Encoding scheme Equality encoding: all bits to 0 except the one that corresponds to the value Range Encoding: the vi righmost bits to 0, the remaining to 1

19 Range encoding single component, base-9  R (A) B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B 0 3 11 1 1 1 1 0 0 0 2 11 1 1 1 1 1 0 0 1 11 1 1 1 1 1 1 0 8 1 0 0 0 0 0 0 0 0 0 11 1 1 1 1 1 1 1 7 1 1 0 0 0 0 0 0 0 5 11 1 1 0 0 0 0 0 6 11 1 0 0 0 0 0 0 4 11 1 1 1 0 0 0 0

20 Example (revisited) sequence value-list index(Equality)  R (A) B 2 2 B 1 2 B 0 2 B 2 1 B 1 1 B 0 1 3 (1x3+0) 0 1 0 0 0 1 2 0 0 1 1 0 0 1 0 0 1 0 1 0 2 0 0 1 1 0 0 8 1 0 0 1 0 0 2 0 0 1 1 0 0 2 0 0 1 1 0 0 0 0 0 1 0 0 1 7 1 0 0 0 1 0 5 0 1 0 1 0 0 6 1 0 0 0 0 1 4 0 1 0 0 1 0

21 Example sequence range-encoded index  R (A) B 1 2 B 0 2 B 1 1 B 0 1 3 1 0 1 1 2 1 1 0 0 1 1 1 1 0 2 1 1 0 0 8 0 0 0 0 2 1 1 0 0 2 1 1 0 0 0 1 1 1 1 7 0 0 1 0 5 1 0 0 0 6 0 0 1 1 4 1 0 1 0

22 Design Space …. equality range

23 RangeEval Evaluates each range predicate by computing two bitmaps: BEQ bitmap and either BGT or BLT RangeEval-Opt uses only <= A < v is the same as A <= v-1 A > v is the same as Not( A <= v) A >= v is the same as Not (A <= v-1)

24 RangeEval-OPT

25


Download ppt "ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of."

Similar presentations


Ads by Google