Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han.

Similar presentations


Presentation on theme: "Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han."— Presentation transcript:

1 Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han

2 Motivation 1. Speed up the queries on data warehoues  data warehoues are large  read-mostly  always perform queries of aggregate, filter, and group the data

3 Motivation 2. The first rigorous examination of variant indexes in the literature  Advantages over traditional Value-List indexes for certain classes of queries  More than one type of index available on a column

4 Motivation 3. Introducing a new indexing approach to support OLAP-type queries  Datacube  Multi-dimensional query  Depends on summary tables

5 Value-List Index (B + tree) Problem: A key values will have large number of associated RID’s! BrightonDowntown Mianus A212 Brighton 750 A101Downtown500 A110Downtown600 ……… Leaf Node RID

6 Bitmap Indexes  A Bitmap for a value: an array of bits. The ith bit is set to 1 if the ith record has the value  A Bitmap index: consists of one bitmap for each value that attribute can take  A Bitmap is an alternate method of representing RID-lists in low-cardinality a Value-List index (low-cardinality) PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 Bitmap for Size 12 0000011 13 0100000 14 1001100 15 0010000 Bitmap for Brand Dell 1001000 HP 0100101 Sony 0010000 IBM 0000010

7 Bitmap Indexes 1. More space efficient than RID lists in a Value-List index  No compression |RID|=32bits, #row=n, #distinct value=m If m<32 m*n<32*n  Compression: Run-length encoding 2. More CPU efficient for may functions  Boolean operations ex1: Select Brand From Product Where Brand=‘HP’ and Size=13 (AND) ex2: Select Pid From Product Where Size>12 and Size<15 (OR)

8 2. More CPU efficient for may functions  Count Select count(*) From Product Where Brand=‘Dell’ and Size>14 3.Each individual bitmap is small and frequently used ones can be cached in memory 4. Available in most major commercial DBMS Bitmap Indexes

9 Projection Index A projection index for column duplicates all column values for lookup by ordinal number. Col1Col2 v1 v2. v k Col3Col4 Col2 v1 v2. v k projection index for col2  Easy to locate N=1000*p+s (p: page#, s: slot#)  Few disk I/O

10 Bit-Sliced Index A set of bitmap slices which are orthogonal to the data held in a projection index. (i.e. a bitwise vertical partition) 0 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 B0 B0 B1B1 bit-slice B nn : bitmap representing set of non null values in the indexed column Col2: 20 52 20 62 10 34 1 49 B2B2 B3B3 B4 B4 B5 B5

11 Comparison of Indexes (evaluating Single-Column Sum Aggregates) Select SUM(doloar_sales) From Sales Where condition Analyze the disk page I/O cost Plan 1 : Direct access to the rows to calculate the Sum 100million rows, Len(row)=200B, |page|=4K 20rows/page, |Foundset|=2million rows Plan 2 : Calculating Sum through a Projection Index Len(doloar_sales)=4B, 1000rows/page 100,000 pages

12 Plan 3 : Calculating the Sum through a Value-List(Bitmap) Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for each non-null value v in the index for C { Designate the set of rows with value v as Bv SUM += v * COUNT(Bf AND Bv); } Return SUM; Bf: 100,000,000bits=12,500,000B  3125 pages Bv: 100,000,000RIDs of 4 bytes each  100,000 pages Total: 103,125pages Comparison of Indexes (evaluating Single-Column Sum Aggregates)

13 Plan 4 : Calculating the SUM through a Bit-Sliced Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for i = 0 to N SUM += * COUNT(Bi AND Bf); Return SUM; Bf: 100,000,000bits=12,500,000B  3125 pages 2 million rows:  21Bitmaps Total:22*3125=68,750 pages Comparison of Indexes (evaluating Single-Column Sum Aggregates)

14 MethodI/OCPU contribution Add from Rows1,341 KI/O + 2M*(25 ins) Projection index100KI/O + 2M*(10 ins) Value-List index103KI/O + 100M*(10 ins) Bit-Sliced index69KI/O + 197M*(1 ins) Comparison of Indexes (evaluating Single-Column Sum Aggregates)

15 Evaluating Aggregate Function AggregateValue-List Index Projection Index Bit-Sliced Index COUNTNot needed SUMNot badGoodBest AVGNot badGoodBest MAX/MINBestSlow MEDIAN,N- Tile Usually BestNot UsefulSometimes Best Column- Product Very SlowBestVery Slow

16 Range Evaluation Performance Range Evaluation Value-List Index Projection Index Bit-Sliced Index Narrow RangeBestGood Wide RangeNot BadGoodBest

17 Evaluating OLAP-style Queries OLAP approach creates precalculates results of some Grouped Queries and stores them in summary tables.  The expected set of queries is known in advance?  Size of data in summary tables grows as the product of the number of values in the independent dimensions (space requirement?) How to speed up Join and Group By ? Join Indexes and Bitmap-Join-Indexes

18 Join Indexes PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 A join index: an index on one table that involves a column value from different table through a commonly encountered join. CidPidDollar_salesUnit 0100000000112001 0101000000326002 0110000000216001 0111000000112001 CidSize 010014 011114 011013 010115

19 Bitmap Join Index CidPidDollar_salesUnit 0100000000112001 0101000000326002 0110000000216001 0111000000112001 PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 DellHPSonyIBM 1000 0010 0100 1000 A Bitmap Join Index spans multiple tables and improves query performance between the joined tables.

20 PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 CidPidDollar_salesUnit 0100000000112001 0101000000326002 0110000000216001 0111000000112001 CidState 0100CA 0101NY 0110CA 0111PA Select Sum(Dollar_sales) From Sales S Natural Join Product P Natural Join Customer C Where P.Brand=‘Dell’ AND C.State=‘PA’ 1 0 0 1 0 0 0 1 0 0 0 1 = and Bitmap Join Index

21 Calculating Groupset Aggregates  Select Sum(F.A) From S,D1,D2,D3 Where condition Group by D1.d1, D2.d2, D3.d3  Using Value-List index to determine Groupset (F.di=Di.di, without join!)  Using Projection index on F.A to get SUM(F.A)

22 Improved Grouping Efficiency Problem: Groupsets and rows are randomly placed on disk. Segmentation: Partition rows in F into Segments. Query evaluation: one segment at a time. Clustering: Cluster the fact table F D1 =d1-1 111111111111111111111110000000000000000000… =d1-2 000000000000000000000001111111111111111000… …… D2 =d2-1 111111000000000000000001111110000000000000… =d2-2 000000111111100000000000000001111111100000… …… D3 =d3-1 110000110000000000000001111110000000000000… =d3-2 001100001100000000000000000001111111100000… …… =d3-n3 000011000001100000000000000001111111100000… (d1-1, d2-1, d3-1) 11000000000000000000000000000000000000… (d1-1, d2-1, d3-2) 00110000000000000000000000000000000000… Groupset Indexes: Keyvalues are a concatenation of the dimensional primary-key values

23 Conclusion Analyze Value-List index, Bitmap index, Projection index and Bit-Sliced index Combine Bitmap indexing and physical row clustering to evaluate OLAP queries involving aggregation and grouping

24 Reference 1.Improved Query Performance With Variant Indexes – Patrick O’Neil and Dallan Quass, Proc. ACM SIGMOD Conf. 1997, Pages 38-49. 2.Bitmap Index Design and Evaluation – C.Y. Chan and Y.E. Ioannidis 1998. 6 3.Database System Implementation – Hector Garcia M., Jeffrey D.U. and Jennifer W., Prentice Hall, 2000 4.Encoded Bitmap Indexing for Data Warehouses – M.C. Wu and A.P. Buchmann 1998. 2 5.An Efficient Bitmap Encoding Scheme for Selection Queries – C.Y. Chan and Y.E. Ioannidis 1998. 6 6.Multidimensional Indexing and Query Coordination for Tertiary Storage Management – A. Shoshani and L.M. Bernardo, etc. 1999. 10 7.Multi-Table Joins Through Bitmapped Join Indices – P. O’Neil and G. Graefe 1995. 9


Download ppt "Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han."

Similar presentations


Ads by Google