Presentation is loading. Please wait.

Presentation is loading. Please wait.

Based on notes by Jim Gray

Similar presentations


Presentation on theme: "Based on notes by Jim Gray"— Presentation transcript:

1 Based on notes by Jim Gray
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Jim Gray Microsoft Adam Bosworth Microsoft Andrew Layman Microsoft Hamid Pirahesh IBM Presented by: Changwu Li Based on notes by Jim Gray 9/17/2018

2 The Data Analysis Cycle
Spread Sheet Table 1 10 15 12 9 6 3 Size vs Speed Access Time (seconds) -9 -6 -3 Cache Main Secondary Disc Nearline Tape Offline Online 4 2 -2 -4 Price vs Speed Size(B) $/MB visualize Extract analyze User extracts data from database with query Then visualizes, analyzes data with desktop tools 9/17/2018

3 Relational Aggregate Operators
SQL has several aggregate operators: sum(), min(), max(), count(), avg() Other systems extend this with many others: stat functions, financial functions, ... The basic idea is: Combine all values in a column into a single scalar value. Syntax select sum(units) from inventory; 9/17/2018

4 Relational Group By Operator
Group By allows aggregates over table sub-groups Result is a new table Syntax: select deptno, sum(salary) from emp group by deptno 9/17/2018

5 Problems With This Design
Users Want Histograms Users want sub-totals and totals drill-down & roll-up reports Users want CrossTabs Conventional wisdom These are not relational operators They are in many report writers and query engines F() G() H() sum M T W T F S S AIR HOTEL FOOD MISC 9/17/2018

6 Table 5a: Ford Sales Cross Tab
A cross tab example Table 5a: Ford Sales Cross Tab Ford 1994 1995 total (ALL) black 50 85 135 white 10 75 60 160 220 9/17/2018

7 How to solve this problem? Answer: cube
9/17/2018

8 The Idea: Think of the N-dimensional Cube Each Attribute is a Dimension
N-dimensional Aggregate (sum(), max(),...) fits relational model exactly: a1, a2, ...., aN, f() Super-aggregate over N-1 Dimensional sub-cubes ALL, a2, ...., aN , f() a3 , ALL, a3, ...., aN , f() ... a1, a2, ...., ALL, f() this is the N-1 Dimensional cross-tab. Super-aggregate over N-2 Dimensional sub-cubes ALL, ALL, a3, ...., aN , f() a1, a2 ,...., ALL, ALL, f() 9/17/2018

9 An Example CUBE 9/17/2018

10 Why the ALL Value? Need a new “Null” value (overloads the null indicator) Value must not already be in the aggregated domain Can’t use NULL since may aggregate on it. Think of ALL as a token representing the set All(color)={red, white, blue}, All(year)={1990, 1991, 1992}, All(model)={Chevy, Ford} Follow “set of values” semantics. 9/17/2018

11 CUBE operator: Syntax select model, make, year, sum(sales)
Proposed syntax: Note: Group By operator repeats aggregate list in select list in group by list select model, make, year, sum(sales) from car_sales where model in {“chevy”, “ford”} and year between 1990 and 1994 group by model, make, year with cube having sum(sales) > 0; 9/17/2018

12 How To Compute the Cube? If each attribute has Ni values CUBE has P (Ni+1) values Compute N-D cube with hash if fits in RAM Compute N-D cube with sort if overflows RAM Same comments apply to subcubes: compute N-D-1 subcube from N-D cube. Aggregate on “biggest” domain first when >1 deep Aggregate functions need hidden variables: e.g. average needs sum and count. 9/17/2018

13 Example: Compute 2D core of 2 x 3 cube Then compute 1D edges
Then compute 0D point Works for algebraic and distributive functions Saves “lots” of calls 9/17/2018

14 Real world implementation
Both Oracle 9i and SQL server 2000 An example in Oracle 9i: select deptno, job, sum(sal) as salary from emp group by cube(deptno, job) 9/17/2018

15 -------- ------------------ -------------------- 10 CLERK 1300
DEPTNO JOB SALARY 10 CLERK 10 MANAGER 10 PRESIDENT 20 ANALYST 20 CLERK 20 MANAGER 30 CLERK 30 MANAGER 30 SALESMAN ANALYST CLERK MANAGER PRESIDENT SALESMAN 2902 9/17/2018

16 1999 ACM Turing Award Jim Gray 9/17/2018


Download ppt "Based on notes by Jim Gray"

Similar presentations


Ads by Google