Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Presented by Russell Myers Paper by Ming-Chuan Wu and Alejandro P. Buchmann.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
Spatial indexing PAMs (II).
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
ITIS 5160 Indexing. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
CS561-S2004 strategies for processing ad hoc queries 1 Strategies for Processing Ad Hoc Queries on Large Data Warehouses Presented by Fan Wu Instructor:
COMP 451/651 Multiple-key indexes
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Data Warehousing.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Index tuning Performance Tuning. Overview Index An index is a data structure that supports efficient access to data Set of Records index Condition on.
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
Prof. Bayer, DWH, Ch.5, SS Chapter 5. Indexing for DWH D1Facts D2.
Clustering using Wavelets and Meta-Ptrees Anne Denton, Fang Zhang.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
What is OLAP?.
File Organizations and Indexing
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
1 Data Structures and Algorithms Outline This topic will describe: –The concrete data structures that can be used to store information –The basic forms.
병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Dense-Region Based Compact Data Cube
How To Build a Compressed Bitmap Index
Indexing and hashing.
ITIS 5160 Indexing.
Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
Database Implementation Issues
Chapter 11: Indexing and Hashing
Storage Structure and Efficient File Access
DATABASE IMPLEMENTATION ISSUES
Database Implementation Issues
Chapter 11: Indexing and Hashing
Database Implementation Issues
Presentation transcript:

Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University

Agenda Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion

Requirements on Indexing methods Symmetric partial match queries – Continuous e.g. “time between Jan to July 94” – Discontinuous e.g. “first month of each year” Indexing at multiple levels of aggregation – Pre-computation group-bys – Indexing summary data Handing multiple traversal orders Efficient batch update Handling sparse data efficiently

Existing methods Multidimensional array-based methods – Works efficiently when data is dense – Essbase’s schema E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense) –B-tree on Product and Store –Two-dimensional array on time and scenarios – Evaluation of Essbase’s schema May cause multiple searches. –E.g. searching store = “something” on product-store index Performance depends on ability to find enough dense dimensions. Efficient batch update

Existing methods… Cont... Bit mapped indices – Pros: Low cardinality data, bit maps are both spaced and retrieval efficient. Supports bitwise operations Access data is clustered All dimensions handles symmetrically – Cons Range queries Increased space overhead of storing the bit-maps specially for high cardinality data Expensive batch update as all bit mapped indices have to be modified even for a single row insertion

Existing methods... Cont… Bit-mapped indices variants – Compression – Hybrid – Dynamic Bit-maps

Existing methods... Cont… Hierarchical Indices – Example: Product - Store Index product first also store summaries on product level. For each product value, create index for Store and store summaries for product-store level – Pros: Allows faster access to higher levels data Dimensions are symmetrically handled – Cons: Widely used index storage overhead The average retrieval efficiency can suffer because of large indexing structure

Existing methods… Cont… Multidimensional indices – Use of of the indexed methods designed for spatial data E.g RTree, GridFiles etc.

Optimized R-Tree of OLAP data Rectangular dense region (only the boundaries that contain more than threshold number of points – Contains a pointer to variable length array of (TIDs or the tuples itself) – Points in sparse regions Finding dense regions – Ask Expert? – Use of clustering algorithm (similar algorithm: image analysis) Need evaluation!!

R-Tree VS Bit-mapped indices R-Tree Pros: – Allows range queries – Smaller space overhead – Update is more efficient Bit-mapped Pros: – Faster Bit-wise operation – Efficient for low cardinality, few restricted dimensions, and sparse data.

Conclusion High level overview Recommended readings – MOLAP VS OLAP – R-Tree and variants – R-Tree alternatives – Computational of multidimensional aggregates – And More…..