Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.

Slides:



Advertisements
Similar presentations
Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ.
Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Dimensional Modeling.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Fast Algorithms For Hierarchical Range Histogram Constructions
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.
Parallelizing the Data Cube PhD Oral Defence Todd Eavis July 23, 2003.
Database Systems: Design, Implementation, and Management Tenth Edition
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Based on the work of Jeffrey Scott Vitter and Min Wang.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
Chapter 13 The Data Warehouse
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SQL Analysis Services Microsoft® SQL Server 2005 Analysis Services provides unified, fully integrated views of your business data to support online.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Warehouse & Data Mining
A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.
Efficient Methods for Data Cube Computation and Data Generalization
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
What is OLAP?.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
Cubing Heuristics (JIT lecture) Heuristics used during data cube computation.
병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
An Overview of Data Warehousing and OLAP Technology
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Dense-Region Based Compact Data Cube
Intro to MIS – MGS351 Databases and Data Warehouses
Table General Guidelines for Better System Performance
Introduction to SQL Server Analysis Services
Chapter 13 Business Intelligence and Data Warehouses
Parallel Databases.
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Chapter 13 The Data Warehouse
Chapter 5: Advanced SQL Database System concepts,6th Ed.
Blazing-Fast Performance:
COSC 6340 Projects & Homeworks Spring 2002
Implementing Data Models & Reports with Microsoft SQL Server
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Selected Topics: External Sorting, Join Algorithms, …
Table General Guidelines for Better System Performance
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Presentation transcript:

Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL

Frank Dehnewww.dehne.net Data Warehousing for Decision Support Operational data collected into DW DW used to support multi-dimensional views Views form the basis of OLAP processing Our focus: the OLAP server

Frank Dehnewww.dehne.net Multi-dimensional views Collection of feature attributes Aggregate along one or more measure attributes Reduce the granularity by collapsing dimensions

Frank Dehnewww.dehne.net Data Cube Generation Proposed by Gray et al (Microsoft) in 1995 Exploits the relationship between cuboids to compute all 2d cuboids In OLAP views are typically pre-computed to improve query response time

Frank Dehnewww.dehne.net Sequential Solutions Top Down Cube –Compute high dimension views first –Exploit shared dimensions –Pipesort –PipeHash Bottom Up Cube –Minimizes external memory sorting by partitioning first on single attributes ArrayCube

Frank Dehnewww.dehne.net ROLAP relational data representation harde to build and query smaller storage no translation from/to relational model MOLAP array representation easy to build and query large storage needs translation from/to relational model Sequential Solutions

Frank Dehnewww.dehne.net Top Down Cube (Pipesort) Construct the data cube lattice Estimate the edge costs Find a least cost spanning tree Compute the views by following the “pipes”

Frank Dehnewww.dehne.net Optimizations –Share-sorts - sharing sorting cost across multiple group-bys. –Smallest parent - computing a cuboid from the smallest previously computed parent. –Cache results - reduce I/O by caching (in memory) parent views from which other cuboids are computed. –Amortize disk-scans - compute as many child views as possible when scanning each parent.

Frank Dehnewww.dehne.net Bottom Up Cube

Frank Dehnewww.dehne.net Bottom Up Cube Partition large view into memory-sized units Perform sorting operations in memory May significantly reduce external memory processing

Frank Dehnewww.dehne.net Our Results –Parallel top-down ROLAP cube construction for shared disks (Distributed and Parallel Databases, 2002) –Parallel top-down ROLAP cube construction for distributed disks (IPDPS 2002) –Parallel bottom-up ROLAP cube construction for shared and distributed disks (Distributed and Parallel Databases, 2002) –Parallel ROLAP cube indexing for distributed disks (CCGrid 2003)

Frank Dehnewww.dehne.net Our Results –Parallel top-down ROLAP cube construction for shared disks (Distributed and Parallel Databases, 2002) –Parallel top-down ROLAP cube construction for distributed disks (IPDPS 2002) –Parallel bottom-up ROLAP cube construction for shared and distributed disks (Distributed and Parallel Databases, 2002) –Parallel ROLAP cube indexing for distributed disks (CCGrid 2003)

Frank Dehnewww.dehne.net Parallel top-down ROLAP cube Our approach: –Partition the load in advance and assign cuboids to individual processors –Local computation exploits existing optimized sequential algorithms (ROLAP) –Communication is reduced to a single phase in which work lists are distributed

Frank Dehnewww.dehne.net Cut the process tree into p “equal weight” sub-trees Each processor independently generates cuboids from its own sub-tree Load balance/stripe the output Parallel top-down ROLAP cube

Frank Dehnewww.dehne.net Tree Partitioning Optimal tree partitioning is NP-complete Min-max tree k-partitioning: Given a tree T with n vertices and a positive weight assigned to each vertex, delete k edges in the tree to obtain k connected components T 1, T 2,... T k +1 such that the largest total weight of a resulting sub-tree is minimized. O(n) time, Frederickson 1990 O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82

Frank Dehnewww.dehne.net Over Sampling

Frank Dehnewww.dehne.net Time vs. #Proc

Frank Dehnewww.dehne.net For more information...

Frank Dehnewww.dehne.net