Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.

Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL

Frank Dehnewww.dehne.net Data Warehousing for Decision Support Operational data collected into DW DW used to support multi-dimensional views Views form the basis of OLAP processing Our focus: the OLAP server

Frank Dehnewww.dehne.net Multi-dimensional views Collection of feature attributes Aggregate along one or more measure attributes Reduce the granularity by collapsing dimensions

Frank Dehnewww.dehne.net Data Cube Generation Proposed by Gray et al (Microsoft) in 1995 Exploits the relationship between cuboids to compute all 2d cuboids In OLAP views are typically pre-computed to improve query response time

Frank Dehnewww.dehne.net Sequential Solutions Top Down Cube –Compute high dimension views first –Exploit shared dimensions –Pipesort –PipeHash Bottom Up Cube –Minimizes external memory sorting by partitioning first on single attributes ArrayCube

Frank Dehnewww.dehne.net ROLAP relational data representation harde to build and query smaller storage no translation from/to relational model MOLAP array representation easy to build and query large storage needs translation from/to relational model Sequential Solutions

Frank Dehnewww.dehne.net Top Down Cube (Pipesort) Construct the data cube lattice Estimate the edge costs Find a least cost spanning tree Compute the views by following the “pipes”

Frank Dehnewww.dehne.net Optimizations –Share-sorts - sharing sorting cost across multiple group-bys. –Smallest parent - computing a cuboid from the smallest previously computed parent. –Cache results - reduce I/O by caching (in memory) parent views from which other cuboids are computed. –Amortize disk-scans - compute as many child views as possible when scanning each parent.

Frank Dehnewww.dehne.net Bottom Up Cube

Frank Dehnewww.dehne.net Bottom Up Cube Partition large view into memory-sized units Perform sorting operations in memory May significantly reduce external memory processing

Frank Dehnewww.dehne.net Our Results –Parallel top-down ROLAP cube construction for shared disks (Distributed and Parallel Databases, 2002) –Parallel top-down ROLAP cube construction for distributed disks (IPDPS 2002) –Parallel bottom-up ROLAP cube construction for shared and distributed disks (Distributed and Parallel Databases, 2002) –Parallel ROLAP cube indexing for distributed disks (CCGrid 2003)

Frank Dehnewww.dehne.net Parallel top-down ROLAP cube Our approach: –Partition the load in advance and assign cuboids to individual processors –Local computation exploits existing optimized sequential algorithms (ROLAP) –Communication is reduced to a single phase in which work lists are distributed

Frank Dehnewww.dehne.net Cut the process tree into p “equal weight” sub-trees Each processor independently generates cuboids from its own sub-tree Load balance/stripe the output Parallel top-down ROLAP cube

Frank Dehnewww.dehne.net Tree Partitioning Optimal tree partitioning is NP-complete Min-max tree k-partitioning: Given a tree T with n vertices and a positive weight assigned to each vertex, delete k edges in the tree to obtain k connected components T 1, T 2,... T k +1 such that the largest total weight of a resulting sub-tree is minimized. O(n) time, Frederickson 1990 O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82

Frank Dehnewww.dehne.net Over Sampling

Frank Dehnewww.dehne.net Time vs. #Proc

Frank Dehnewww.dehne.net For more information... http://cgm.dehne.net

Frank Dehnewww.dehne.net

Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.

Similar presentations

Presentation on theme: "Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.

Similar presentations

Presentation on theme: "Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL."— Presentation transcript:

Similar presentations

About project

Feedback