1 CUBE: A Relational Aggregate Operator Generalizing Group By Jim Gray Adam Bosworth Andrew Layman Microsoft Microsoft.com Hamid Pirahesh IBM.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

0 - 0.
Addition Facts
Dr. Alexandra I. Cristea CS 252: Fundamentals of Relational Databases: SQL5.
Google News Personalization Scalable Online Collaborative Filtering
Chapter 7: Arrays In this chapter, you will learn about
1 Lecture 5: SQL Schema & Views. 2 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL)
Access Lesson 13 Programming in Access Microsoft Office 2010 Advanced Cable / Morrison 1.
CS 3630 Database Design and Implementation. Where Clause and Aggregate Functions -- List all rooms whose price is greater than the -- average room price.
Course ILT Working with related tables Unit objectives Use the Lookup Wizard to create a lookup field and a multivalued field Modify lookup field properties.
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
Microsoft Office Illustrated Fundamentals Unit K: Working with Data.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
N.G.Acharya & D.K.Marathe college Chembur-E, Mumbai-71
1 Quick recap of the SQL & the GUI way in Management Studio The Adwentureworks database from the book A script to create tables & insert data for the Amazon.
State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Addition 1’s to 20.
Week 1.
Computer Concepts BASICS 4th Edition
Chapter 4 Systems of Linear Equations; Matrices
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Concepts of Database Management Seventh Edition
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
5.1Database System Concepts - 6 th Edition Chapter 5: Advanced SQL Advanced Aggregation Features OLAP.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Chapter 11 Group Functions
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
 N. Roussopoulos 2007 OLAP & Data Cubing Spring 2007 Nick Roussopoulos
Data Cube and OLAP Server
Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data.
Business Intelligence. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views.
Sensor Data Management with Model-based View LSIR, EPFL.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
Concepts of Database Management Sixth Edition
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Advanced Databases 5841 DATA CUBE. Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
Concepts of Database Management, Fifth Edition
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
Concepts of Database Management Seventh Edition
Concepts of Database Management Seventh Edition
Data Warehousing.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th,
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
Data Warehousing.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Plan for Final Lecture What you may expect to be asked in the Exam?
Lecturer : Dr. Pavle Mogin
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th, 2001 SQL/OLAP
Based on notes by Jim Gray
DATA CUBE Advanced Databases 584.
Data warehouse Design Using Oracle
Access: SQL Participation Project
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

1 CUBE: A Relational Aggregate Operator Generalizing Group By Jim Gray Adam Bosworth Andrew Layman Microsoft Microsoft.com Hamid Pirahesh IBM

2 The Data Analysis Cycle oUser extracts data from database with query oThen visualizes, analyzes data with desktop tools Spread Sheet Table Size vs Speed Access Time (seconds) Cache Main Secondary Disc Nearline Tape Offline Tape Online Tape Price vs Speed Access Time (seconds) Cache Main Secondary Disc Nearline Tape Offline Tape Online Tape Size(B) $/MB visualize Extract analyze

3 Division of labor Computation vs Visualization oRelational system builds CUBE relation –aggregation best done close to data –Much filtering of data possible –Cube computation may be recursive »(e.g., percent of total, quartile,....) oVisualization System displays/explores the cube

4 Relational Aggregate Operators oSQL has several aggregate operators: –sum(), min(), max(), count(), avg() oOther systems extend this with many others: –stat functions, financial functions,... oThe basic idea is: –Combine all values in a column –into a single scalar value. oSyntax select sum(units) from inventory;

5 Relational Group By Operator oGroup By allows aggregates over table sub-groups oResult is a new table oSyntax: select location, sum(units) from inventory group by location having nation = USA;

6 Problems With This Design oUsers Want Histograms oUsers want sub-totals and totals –drill-down & roll-up reports oUsers want CrossTabs oConventional wisdom –These are not relational operators –They are in many report writers and query engines sum M T W T F S S  AIR HOTEL FOOD MISC  F() G() H()

7 Thesis: The Data CUBE Relational Operator Generalizes Group By and Aggregates

8 The Idea: Think of the N-dimensional Cube Each Attribute is a Dimension oN-dimensional Aggregate (sum(), max(),...) –fits relational model exactly: »a 1, a 2,...., a N, f() oSuper-aggregate over N-1 Dimensional sub-cubes »ALL, a 2,...., a N, f() »a 3, ALL, a 3,...., a N, f() »... »a 1, a 2,...., ALL, f() –this is the N-1 Dimensional cross-tab. oSuper-aggregate over N-2 Dimensional sub-cubes »ALL, ALL, a 3,...., a N, f() »... »a 1, a 2,...., ALL, ALL, f()

9 An Example CUBE

10 Why the ALL Value? oNeed a new Null value (overloads the null indicator) oValue must not already be in the aggregated domain oCant use NULL since may aggregate on it. oThink of ALL as a token representing the set –{red, white, blue}, {1990, 1991, 1992}, {Chevy, Ford} oRules for ALL in other areas not explored –assertions –insertion / deletion /... –referential integrity oFollow set of values semantics.

11 CUBE operator: Syntax oProposed syntax: oNote: Group By operator repeats aggregate list –in select list –in group by list select model, make, year, sum(units) from car_sales where model in {chevy, ford} and year between 1990 and 1994 group by model, make, year with cube having sum(units) > 0;

12 Why This Syntax? oabstract syntax oallows functional aggregations (e.g., sales by quarter): select from where group by [ with [ cube | roll up] ] having select store, quarter, sum(units) from sales where nation = Mexico group by store, quarter(date) as quarter with roll up and year = 1994;

13 Decorations and Abstractions oSometimes want to tag cube with redundant values –region #, region_name, sales –region name is not a dimension, it is a decoration –Decorations are functionally dependent on dimensions oMore interesting, some dimensions are aggregations. oOften these aggregations are not linear (are a lattice) oRather than treat time as 12 dimensions –Recognize abstractions as one dimension (like decorations) –Compute efficiently (virtual functions) second minute hour day week month quarter year Xmas Easter Thanksgiving Holiday block city county state nation

14 Interesting Aggregate Functions oFrom RedBrick systems –Rank (in sorted order) –N-Tile (histograms) –Running average (cumulative functions) –Windowed running average –Percent of total oUsers want to define their own aggregate functions –statistics –domain specific

15 User Defined Aggregates oIdea: –User function is called at start of each group –Each function instance has scratchpad –Function is called at end of group oExample: SUM –START: allocates a cell and sets it to zero –NEXT: adds next value to cell –END:deallocates cell and returns value –Simple example: MAX() oThis idea is in Illustra, IBMs DB2/CS, and SQL standard oNeeds extension for rollup and cube Scratchpad start next end

16 User Defined Aggregate Function Generalized For Cubes oAggregates have graduated difficulty –Distributive : can compute cube from next lower dimension values (count, min, max,...) –Algebraic : can compute cube from next lower lower scratchpads (average,...) –Holistic : Need base data (Median, Mode, Rank..) oDistributive and Algebraic have simple and efficient algorithm: build higher dimensions from core oHolistic computation seems to require multiple passes. –real systems use sampling to estimate them »(e.g., sample to find median, quartile boundaries)

17 How To Compute the Cube? If each attribute has N i values CUBE has (N i +1) values oCompute N-D cube with hash if fits in RAM oCompute N-D cube with sort if overflows RAM oSame comments apply to subcubes: –compute N-D-1 subcube from N-D cube. –Aggregate on biggest domain first when >1 deep –Aggregate functions need hidden variables: »e.g. average needs sum and count. oUse standard techniques from query processing –arrays, hashing, hybrid hashing –fall back on sorting.

18 Example: oCompute 2D core of 2 x 3 cube oThen compute 1D edges oThen compute 0D point oWorks for algebraic and distributive functions Saves lots of calls

19 Summary oCUBE operator generalizes relational aggregates oNeeds ALL value to denote sub-cubes –ALL values represent aggregation sets oNeeds generalization of user-defined aggregates oDecorations and abstractions are interesting oComputation has interesting optimizations Research Topics oGeneralize Spreadsheet Pivot operator to RDBs oCharacterize Algebraic/Distributive/Holistic functions for update