Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.

Similar presentations


Presentation on theme: "Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University."— Presentation transcript:

1 Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University

2 Agenda Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion

3 Requirements on Indexing methods Symmetric partial match queries – Continuous e.g. “time between Jan to July 94” – Discontinuous e.g. “first month of each year” Indexing at multiple levels of aggregation – Pre-computation group-bys – Indexing summary data Handing multiple traversal orders Efficient batch update Handling sparse data efficiently

4 Existing methods Multidimensional array-based methods – Works efficiently when data is dense – Essbase’s schema E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense) –B-tree on Product and Store –Two-dimensional array on time and scenarios – Evaluation of Essbase’s schema May cause multiple searches. –E.g. searching store = “something” on product-store index Performance depends on ability to find enough dense dimensions. Efficient batch update

5 Existing methods… Cont... Bit mapped indices – Pros: Low cardinality data, bit maps are both spaced and retrieval efficient. Supports bitwise operations Access data is clustered All dimensions handles symmetrically – Cons Range queries Increased space overhead of storing the bit-maps specially for high cardinality data Expensive batch update as all bit mapped indices have to be modified even for a single row insertion

6 Existing methods... Cont… Bit-mapped indices variants – Compression – Hybrid – Dynamic Bit-maps

7 Existing methods... Cont… Hierarchical Indices – Example: Product - Store Index product first also store summaries on product level. For each product value, create index for Store and store summaries for product-store level – Pros: Allows faster access to higher levels data Dimensions are symmetrically handled – Cons: Widely used index storage overhead The average retrieval efficiency can suffer because of large indexing structure

8 Existing methods… Cont… Multidimensional indices – Use of of the indexed methods designed for spatial data E.g RTree, GridFiles etc.

9 Optimized R-Tree of OLAP data Rectangular dense region (only the boundaries that contain more than threshold number of points – Contains a pointer to variable length array of (TIDs or the tuples itself) – Points in sparse regions Finding dense regions – Ask Expert? – Use of clustering algorithm (similar algorithm: image analysis) Need evaluation!!

10 R-Tree VS Bit-mapped indices R-Tree Pros: – Allows range queries – Smaller space overhead – Update is more efficient Bit-mapped Pros: – Faster Bit-wise operation – Efficient for low cardinality, few restricted dimensions, and sparse data.

11 Conclusion High level overview Recommended readings – MOLAP VS OLAP – R-Tree and variants – R-Tree alternatives – Computational of multidimensional aggregates – And More…..


Download ppt "Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University."

Similar presentations


Ads by Google