1 Multi-way Algorithm for Cube Computation CPS 196.03 Notes 8.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Data Mining Techniques Association Rule
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Frequent Closed Pattern Search By Row and Feature Enumeration
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Implementation & Computation of DW and Data Cube.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Warehousing and OLAP
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Lecture14: Association Rules
Mining Association Rules
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
1 Data Mining, Database Tuning Tuesday, Feb. 27, 2007.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Sequential PAttern Mining using A Bitmap Representation
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Building the cube – Chapter 9 & 10 Let’s be over with it.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
1 On-Line Analytic Processing Warehousing Data Cubes.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Data Warehousing Multidimensional Analysis
CMSC 2021 CMSC 202 Computer Science II for Majors Spring 2003 Mr. Frey (0101 – 0104) Mr. Raouf (0201 – 0204)
What is OLAP?.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Building Tabular Models
BlinkDB.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Remember the Sales Data Cube? Each cell contains a sales measurement, e.g., the number of sales (may contain many other measurements of product-date-country.
On-Line Analytic Processing
BlinkDB.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Chapter 13 The Data Warehouse
Mining Association Rules in Large Databases
Association Rule Mining
Data Warehousing and OLAP
Market Basket Analysis and Association Rules
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

1 Multi-way Algorithm for Cube Computation CPS Notes 8

2 First Programming Project l Individual project, 15 Points in final grade l Sales(customer_id, item_id, item_group, item_price, purchase_date) u Will be provided as a file during demo and for generating performance numbers for project report l Task 1: 5 Points u Interface to enter MIN_SUPPORT (% of customers) u Find frequent itemsets using Apriori (set of item_id’s) l Task 2: 5 Points (Section 5.5 in the textbook) u Interface to enter two constraint types (e.g., SUM(item_price) op const) u Use the constraints in Apriori as effectively as possible, study and demonstrate performance improvement l Task 3: 5 Points u Extension of your choice. Examples include (i) association rules, (ii) complex constraints, (iii) sequential patterns, (iv) variants of apriori, (v) FP-growth

3 File Format l 10,123,3,54,4/4/2008 l 10,12,4,101,4/5/2008 l 14,123,3,54,8/4/2008 l … l Caveats: u Customer Vs. Item u Three datasets: Toy, Medium, and Large u Comma-separated file, one purchase per line in file, no header in file u Integers for simplicity u Note date format

4 First Programming Project: Milestones l Feb 3: Project announced l Feb 17: Mid-project report due u Describe progress and planned extensions u Describe detailed algorithms for all three tasks l Feb 17: Sample data file will be provided for generating performance results for project report l March 2: Submit code, README file to run code, code documentation, and final project report l March 2-4: Project demos (random assignment) l March 6: Spring break. Second project announced

5 Finalized Grading Criteria for Class l Homeworks: 15 points l Programming projects: 40 points l Midterm: 20 points u Note: Midterm is on Feb 19 (Thu) in class l Final: 25 Points

6 ROLAP Server l Relational OLAP Server relational DBMS ROLAP server tools utilities Special indices, tuning; Schema is “denormalized”

7 MOLAP Server l Multi-Dimensional OLAP Server multi- dimensional server M.D. tools utilities could also sit on relational DBMS Product City Date milk soda eggs soap A B Sales

8 MOLAP Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

9 MOLAP A B a1a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2a3 C B

10 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading

11 ROLAP Vs. MOLAP l What do the authors say? l What can you do in MOLAP that you cannot do in ROLAP? l Can the algorithm in this paper be used in ROLAP?

12 Array Storage l Chunks l Compression u Chunk-offset compression Vs. LZW

13 Loading Arrays from Tables l The easy case: array fits in memory l Else: u Partitions

14 l Suppose there are 1000 chunks. 10 chunks can fit in memory. The partition size is 10 chunks chunks 100 Table  Loading Arrays from Tables

15 Basic Array Cubing Algo l First find minimum spanning tree u Hierarchy of aggregates l Compute each (k-1) dimensional aggregate from its best k dimensional aggregate u One pass through the array in the right order Let us look at some basics first

16 Chunked 3D Array C B a1a0 a3 a2 a1 a0 b3 b2 b1 b0 c2c3 A B Dimension order CBA

17 “a0b0” chunk a0b0c0 c1 c2 c3 a0b1c0 c1 c2 c3 a0b2c0 c1 c2 c3 a0b3c0 c1 c2 c3 xxxx xxxx xxxx a0 b0 c0 c1 c2c3 b0 b1 b2 b3 c0 c1 c2c3 …

18 a0b1 chunk a0b0c0 c1 c2 c3 a0b1c0 c1 c2 c3 a0b2c0 c1 c2 c3 a0b3c0 c1 c2 c3 yyyy xy xxxx yyyy a0 b1 c0 c1 c2c3 b0 b1 b2 b3 c0 c1 c2c3 … Done with a0b0

19 a0b2 chunk a0b0c0 c1 c2 c3 a0b1c0 c1 c2 c3 a0b2c0 c1 c2 c3 a0b3c0 c1 c2 c3 zzzz xyz xxxx yyyy zzzz a0 b2 c0 c1 c2c3 b0 b1 b2 b3 c0 c1 c2c3 … Done with a0b1

20 Table Visualization a0b0c0 c1 c2 c3 a0b1c0 c1 c2 c3 a0b2c0 c1 c2 c3 a0b3c0 c1 c2 c3 uuuu xyzu xxxx yyyy zzzz uuuu a0 b3 c0 c1 c2c3 b0 b1 b2 b3 c0 c1 c2c3 Done with a0b2

21 Table Visualization a1b0c0 c1 c2 c3 a1b1c0 c1 c2 c3 a1b2c0 c1 c2 c3 a1b3c0 c1 c2 c3 xxxx xxxx xx yyyy zzzz uuuu a1 b0 c0 c1 c2c3 b0 b1 b2 b3 c0 c1 c2c3 … Done with a0b3 Done with a0c* …

22 a3b3 chunk (last one) a3b0c0 c1 c2 c3 a3b1c0 c1 c2 c3 a3b2c0 c1 c2 c3 a3b3c0 c1 c2 c3 uuuu xyzu xxxx yyyy zzzz uuuu a3 b0 c0 c1 c2c3 b0 b1 b2 b3 c0 c1 c2c3 Finish Done with a0b3 Done with a0c* Done with b*c* …

23 Memory Used l A: 40 distinct values l B: 400 distinct values l C: 4000 distinct values l CBA: Dimension Order l Plane AB: Need 1 chunk (10 * 100 * 1) l Plane AC: Need 4 chunks (10 * 1000 * 4) l Plane BC: Need 16 chunks (100 * 1000 * 16) l Total memory: 1,641,000

24 Memory Used l A: 40 distinct values l B: 400 distinct values l C: 4000 distinct values l ABC: Dimension Order l Plane BC: Need 1 chunk (1000 * 100 * 1) l Plane AC: Need 4 chunks (1000 * 10 * 4) l Plane AB: Need 16 chunks (100 * 10 * 16) l Total memory: 156,000

25 Basic Array Cubing Algo l First find minimum spanning tree u Hierarchy of aggregates l Compute each (k-1) dimensional aggregate from its best k dimensional aggregate u One pass through the array in the right order l What are the advantages and disadvantages of this algorithm?

26 Multi-way Array Cubing Algo l What is the main idea? l Rule 1 on Page 163 l Minimum memory spanning tree u Figure 2 u Figures 3 and 4 l Theorem 1 l Basic idea of multi-pass algorithm u Tradeoff between memory usage and number of passes

27 D1D2D3M A3B2C120 A7B2C110 A13B1C1230 A2B2C110 A3B7C1240 A15B7C120 A6B1C1210 A13B2C120 A1B11C1100 A1B11C150 A13B2C130 A3B11C1210 A13B7C140 A10B1C150 A3B1C1210