Dwarf: A High Performance OLAP Engine Nick Roussopoulos ACT Inc. & UMD.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
A Fully Distributed, Fault-Tolerant Data Warehousing System Katerina Doka, Dimitrios Tsoumakos, Nectarios Koziris Computing Systems Laboratory National.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Physical Data Warehouse Design Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. by Nirish Dhruv Department of Computer.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
SQL Analysis Services Microsoft® SQL Server 2005 Analysis Services provides unified, fully integrated views of your business data to support online.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
1.
Database Systems – Data Warehousing
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Analysis Services 101 Dave Fackler, MCDBA, MCSE, MCT Director, Business Intelligence Practice Intellinet Corporation.
A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
CS 474 Database Design and Application Terminology Jan 11, 2000.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
Faster and Smarter Data Warehouses with Oracle OLAP 11g.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
SO RELIABLE Iain Bray Sales Engineer InterSystems Corporation.
Dependable Technologies for Critical Systems Copyright Critical Software S.A All Rights Reserved. Handling big dimensions in distributed data.
Building Dashboards SharePoint and Business Intelligence.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown President System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2003.
Analytics & Reporting Tool.  Outline how to access SAS OLAP Cubes through SAS AMO  Review SAS OLAP Cube creation and how it relates to integration with.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영
What is OLAP?.
Enterprise Solutions Chapter 11 – In-memory Technology.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Oracle OLAP Option Bud Endress Director of Product Management, OLAP.
#Kscope Why Can’t I Look at an Essbase Cube? Angela Wilcox BI Architect MedAssets.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Efficient Methods for Data Cube Computation
Oracle OLAP Creating Cubes Part 1: Concepts
Chapter 13 The Data Warehouse
IBM COGNOS online Training at GoLogica Technologies
Based on notes by Jim Gray
Blazing-Fast Performance:
Implementing Data Models & Reports with Microsoft SQL Server
Types of OLAP Servers.
Chapter 4: Dimensions, Hierarchies, Operations, Modeling
DataMart (Data Warehouse) Tool:
A New Storage Engine Specialized for MOLAP
Data Warehouse.
Metadata The metadata contains
Data Warehousing Concepts
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Presentation transcript:

Dwarf: A High Performance OLAP Engine Nick Roussopoulos ACT Inc. & UMD

2 Features Complete OLAP engine Computes, indexes, and stores highly compressed data cubes Queries, Incremental Updates Overcomes the “dimensionality-curse” Independent of the number of dimensions and hierarchical levels within Scalable

3 Revolutionary Technology Highly compressed storage Full Cubes: ALL views answerable 100% Precision answers on all views including the fact table Stores a subset of the views in very tight space Tremendous savings Storage Construction time Efficient Query Retrieval Sub-second response

4 APB-1 Benchmark Density 1 (1.3M) Dwarf (Thinkpad): 18 s 57 MB Density 5 ( 65M ) Oracle’s best benchmark 4.5 hrs, GB (4 CPU, RAID) Dwarf 65 min 2.4 GB (Single CPU Pentium 4) Density 40 (496M)) Dwarf: 10.3 hrs 8.2 GB (Single CPU Pentium 4) NOTE: The Fact table that is intrinsically fused in the Dwarf is bigger than the Dwarf itself: Fact table is 32GB in ASCII or 11.8GB in Binary.

5 Real Data Set Fact Table: 13,449,327 Dimensions: 8 Views: 11,200 Creation time: 100 min Challenged by a leading OLAP vendor Took 48 hrs for a “wizard” to decide what to materialize Several more hrs to create and index summary tables Huge storage * Each query asks for 10 different values for 3 randomly selected dimensions (e.g. v1 | v2 |… | v10) and “all” for a 4 th dimension- 10*10*10 point query Dwarf’s response Creation time: 100 min Size: 6.7 GB 1000 Queries*: 15.8 sec

6 Dream DataCube¹ Fact table (5,000,000): Dimensions: 10 (3x9L, 4x4L, 3x2L) Views: 16,875,000 ¹Challenged by a leading OLAP Vendor: “A full cube for this data set can never be built!” Dwarf’s response Creation time: 123 min Size: 6.3 GB 1000 Queries: 325 sec

7 What Makes Dwarf Tick Two breakthrough discoveries Suffix redundancy Fusion of prefix and suffix redundancy Identifies and factors out these redundancies before computing any aggregates for them

8 Dwarf Technology Complete solution Extends to high dimensionality Deep hierarchies Queries the full cube- any dimension & level Incremental updates Indexing is inherent – all in one structure Dwarf holds in the fact table too! No gotchas No expensive preprocessing (just a single sort) No TEMP space required for construction No hidden post-construction costs No information loss (100% precision)

9 Dwarf Software Lean optimized code Tools for discovery Data correlation Optimizing dwarfs A dozen of tuning knobs including Gmin The Knob

10 Data Driven Tuning GMin: Min # of records per aggregate group to be explicitly stored Reduces storage in lower levels of the hierarchies The Knob: Max # of aggregations to be computed on the fly (not stored) Reduces storage in the higher levels of the hierarchies

11 Data Driven Tuning Effects Gmin “The Knob”

12 Dwarf Technology Math behind the scene Exploit data dependencies & correlations Probabilistic counting Dimension scalability Savings/performance increases exponentially with sparseness (and dimensions) Independence of # of dimensions

13 Target Markets Business Intelligence Security Telecom High-dimensional data scientific and sensor data Bioinformatics

14 Dwarf’s Value Puts any OLAP engine on “steroids” and Delivers substantial performance improvement Dwarf a fast and effective substitute of indexing for ROLAP products (supports SQL API) Can expand the customer base for IBM- Hyperion

15 Product Status Patents Pending Metadata management Mapping between external values and internal binaries Can deal with partial cubes Implementation Cross platform (Unix, MS) Connects with all RDBMs Dwarf Browser

16 UMD &ACT’s Experience UMD Group established materialized views and incremental access methods (over 50 publications since 1982) Data warehouse Cubetree Storage Organization started in 1997 (over 12 publications, ACM Best paper Award) Dwarf in

17 Summary of Dwarf Practical all in one structure Remarkable Full Cube Size Reduction Unprecedented performance (construction and query retrieval) Scalable (number of dimensions, hierarchy depth, data size)