Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos.

Slides:



Advertisements
Similar presentations
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Advertisements

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
Fast Algorithms For Hierarchical Range Histogram Constructions
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Dwarf: A High Performance OLAP Engine Nick Roussopoulos ACT Inc. & UMD.
A Fully Distributed, Fault-Tolerant Data Warehousing System Katerina Doka, Dimitrios Tsoumakos, Nectarios Koziris Computing Systems Laboratory National.
Exact algorithms for the minimum latency problem Exact algorithms for the minimum latency problem 吳邦一 黃正男 詹富傑 樹德科大 資工系.
SVM—Support Vector Machines
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
1 An Evaluation of Multi-resolution Storage for Sensor Networks D. Ganesan, B. Greenstein, D. Perelyubskiy, D. Estrin, J. Heidemann ACM SenSys 2003.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
Sensor Data Management with Model-based View LSIR, EPFL.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Techniques for Analysis and Calibration of Multi- Agent Simulations Manuel Fehler Franziska Klügl Frank Puppe Universität Würzburg Lehrstuhl für Künstliche.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
What is OLAP?.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University.
병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
A Decision Tree Approach to Cube Construction Patrick Kelly.
Knowledge Discovery in a DBMS Data Mining Computing models and finding patterns in large databases current major challenge in database systems & large.
Dense-Region Based Compact Data Cube
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Chapter 13 The Data Warehouse
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Types of OLAP Servers.
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Online Analytical Processing Stream Data: Is It Feasible?
Advanced Geospatial Techniques: Aiding Earth Observation Applications
Presentation transcript:

Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos

Yannis Sismanis DOLAP Motivation Dimensional values annotated with hierarchies: Examples Time: secmin hour Store: coderetailer Flat/Lattice Profound effect on cube complexity L hierarchy levels/dimension & d dimensions: 2 d  (L+1) d Traditionally handled externally Mapped into queries on raw data Requires aggregation of the results

Yannis Sismanis DOLAP Importance On Line Analytical Processing (OLAP) Decision Support Systems (DSS) Data Mining Queries Rollup/Drilldown Ad-hoc It’s not just about fast queries Computation/storage/indexing/…

Yannis Sismanis DOLAP Related Work View materialization NP-complete Even greedy algorithms are not practical for high- dimensional/hierarchical cubes Compute the cube Various techniques that suffer from the dimensionality curse Store/Index ROLAP/MOLAP/… Compressed Cubes Condensed,Dwarf,Quotient

Yannis Sismanis DOLAP Our Contribution Extend Dwarf Architecture [SDRK02] Implemented two approaches Partial view-covering Breaks the problem to sub-problems and solves separately each one Hierarchical Treats the problem as a whole Maximize the effects of compression Important on all aspects of cube management Address partial/full materialization Extensive experimentation with real OLAP data

Yannis Sismanis DOLAP Dwarf Overview Complete system (100% accuracy) Compute/Store/Index/Query/Update [SDRK02] Structural Redundancies Prefix Elimination Very high on dense areas Suffix Coalescing (!) Orders of magnitude more important on sparse areas Partial materialization Minimum granularity Resembles iceberg cubes Optimizations Clustering

Yannis Sismanis DOLAP View-covering Partial Dwarfs Use a forest of Dwarfs Encapsulates all views in the hierarchical cube “base dwarf” contains the lowest hierarchical views “partial dwarfs” cover the higher hierarchical views Partial Dwarf: Do not store every possible combination of views Avoids duplication in the final forest of partial dwarfs Fast View covering enumeration process Single traversal over the view-space Keep track of just the last enumerated partial dwarf

Yannis Sismanis DOLAP Partial Dwarfs (example) Store: StoreId  Retailer  ALL Product: Code  Group  ALL Customer: Name  ALL

Yannis Sismanis DOLAP Hierarchical Dwarf Extend the Dwarf model Incorporate hierarchies inside the Dwarf DAG Even higher-level aggregates can be reached through a path from the root Nature of prefix redundancies changes Common prefixes between partial dwarfs Most importantly suffix redundancies are now exploited in a “global” way

Yannis Sismanis DOLAP Hierarchical Dwarf (example) StoreCodeNameSales S1C2N1$10 S2C3N2$30 S3C1N1$60 StoreCode S1R1C1G2 S2R1C2G1 S3R2C3G2

Yannis Sismanis DOLAP Non-linear Hierarchies

Yannis Sismanis DOLAP Experiments Real-world data 8 dimensions (7458,2765,3857,3247,213,660,4,4) 4 hierarchies (1x6,2x4,1x3) 256-views vs 11,200-views Comparison with base Dwarf I.e all hierarchical queries are mapped to the raw data and then further aggregated Full uncompressed cube (BSF)

Yannis Sismanis DOLAP Computation

Yannis Sismanis DOLAP Storage

Yannis Sismanis DOLAP Full Cube Statistics

Yannis Sismanis DOLAP Query Evaluation Simulated Queries Point/Range Children Effect of Rollup/Drill-Down

Yannis Sismanis DOLAP Queries – Gmin=1

Yannis Sismanis DOLAP Queries – Gmin=1000

Yannis Sismanis DOLAP Conclusions Presented two extensions to the Dwarf architecture Decompose the problem to simpler Embed hierarchies in the structure Suffix redundancies are more apparent in hierarchical cubes Compression ratio of more than 70 times Query response performance increase of ~10 times Sparsity exploitation: A minimum granularity of 1,000 minimizes computation time (about 3 times) and increases performance

Yannis Sismanis DOLAP Questions?