Advanced Databases 5841 DATA CUBE. Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and.

Slides:

Advertisements

Similar presentations

1 CUBE: A Relational Aggregate Operator Generalizing Group By Jim Gray Adam Bosworth Andrew Layman Microsoft Microsoft.com Hamid Pirahesh IBM.

Advertisements

Ingres/Vectorwise Implementation Details XXV Ingres Benutzerkonferenz 2012 Confidential © 2011 Actian Corporation Doug Inkster 1 of 9.

Group functions cannot be used in the WHERE clause: SELECT type_code FROM d_songs WHERE SUM (duration) = 100; (this will give an error)

Review for Final Test Indra Budi

Concepts of Database Management Seventh Edition

CS324e - Elements of Graphics and Visualization Color Histograms.

CS 405G: Introduction to Database Systems

Set operators (UNION, UNION ALL, MINUS, INTERSECT) [SQL]

Chapter 11 Group Functions

Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.

5 Chapter 5 Structured Query Language (SQL2) Revision.

Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.

Data Cube and OLAP Server

SQL Neyha Amar CS 157A, Fall Inserting The insert statement is used to add a row of data into a table Strings should be enclosed in single quotes,

Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data.

©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 Extended Aggregation SQL-92 aggregation quite limited  Many useful aggregates.

Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.

Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id

Microsoft Access 2010 Chapter 7 Using SQL.

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.

Enhancements to the GROUP BY Clause Fresher Learning Program January, 2012.

Xin  Syntax ◦ SELECT field1 AS title1, field2 AS title2,... ◦ FROM table1, table2 ◦ WHERE conditions  Make a query that returns all records.

Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.

Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.

1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.

Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross- Tab and Sub-Totals Gray et Al. Presented By: Priya Rajan.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Association of Computing Activities Computer Science and Engineering Indian Institute of Technology Kanpur.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

Structured Query Language Chris Nelson CS 157B Spring 2008.

Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.

Oracle DML Dr. Bernard Chen Ph.D. University of Central Arkansas.

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Extended Aggregation in SQL:1999 The cube operation computes.

Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)

Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.

Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.

AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.

More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.

Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.

©Silberschatz, Korth and Sudarshan3.1Database System Concepts Extended Relational-Algebra-Operations Generalized Projection Aggregate Functions Outer Join.

SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.

Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.

Module 4: Grouping and Summarizing Data. Overview Listing the TOP n Values Using Aggregate Functions GROUP BY Fundamentals Generating Aggregate Values.

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙

05 | SET Operators, Windows Functions, and Grouping Brian Alderman | MCT, CEO / Founder of MicroTechPoint Tobias Ternstrom | Microsoft SQL Server Program.

Advanced Databases More Advanced PL/SQL Programing 1.

Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.

© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 The SqlCommand Object ADO.NET - Lesson 03  Training time: 15 minutes  Author:

SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.

Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.

More SQL: Complex Queries, Triggers, Views, and Schema Modification

SQL Query Getting to the data ……..

Unary Query Processing Operators

Slides are reused by the approval of Jeffrey Ullman’s

Updating SF-Tree Speaker: Ho Wai Shing.

Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)

Lecturer : Dr. Pavle Mogin

Chapter 5: Advanced SQL Database System concepts,6th Ed.

SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th, 2001 SQL/OLAP

Based on notes by Jim Gray

DATA CUBE Advanced Databases 584.

Chapter 4 Summary Query.

Query Execution Presented by Jiten Oswal CS 257 Chapter 15

ICOM 5016 – Introduction to Database Systems

Aggregate Functions.

Slides based on those originally by : Parminder Jeet Kaur

LINQ to SQL Part 3.

Presentation transcript:

Advanced Databases 5841 DATA CUBE

Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and ROLLUP 4. Maintaining the CUBE and ROLLUP

3.3 The ALL Value - Each “ALL” value is actually represents a set. - The set over which the aggregation was computed. Ex. Model.ALL ={Toyota, Ford, Nissan} Ex2.

The ALL() Function - ALL() function generates the set. - ALL() applied to any other value returns NULL. Ex. Model.ALL = ALL(Model) = {Toyota, Ford, Nissan} Year.ALL = ALL(Year) = {2011, 2012, 1990} Color.ALL = ALL(Color) = {Red, Blue, Gray}

The Reasons to Avoid the ALL Value Introduction of ALL makes things complicated - ALL becomes a new keyword denoting the set value. - ALL is similar to NULL make many special cases. - What if SQL does not support Set-Value?

3.4 Avoiding the ALL Value Inspired by the ALL() function: A substitution of implementing the ALL value - Use GROUPING() (a boolean function) to discriminate between NULL and ALL - Use NULL value in the coulmn instead of ALL value Output: (NULL,NULL,NULL,524,True,True,True)

Example: QUERY: SELECT Model, Year, Color, SUM(sales), GROUPING(Model), GROUPING(Year), GROUPING(Color) FROM Sales GROUP BY CUBE Model, Year, Color; Output: (NULL,NULL,NULL,524,True,True,True) Compare to : (ALL,ALL,ALL,524) Note. True means it is the ALL value

Example1. 4. Two New Features Added in CUBE

What About the Percentage of Total Sales? Example1.

Now we can reference to the sub-aggregate value. 1. Reference to the sub-aggregate:

2. Index of a Value How to select a data in 2-D CUBE?

2. Index of a Value How to select a data in 2-D CUBE?

2. Index of a Value How to select a data in 2-D CUBE? What if we want to select a V in N-Dimension?

2. Index of a Value When the dimension goes higher, it is even harder to describe a point in a CUBE. A query is too long to write.

With the index, it is easier to query the data in the CUBE. Especially with a higher dimension CUBE. cube.v(:i, :i, :k)

5. Computing the CUBE and ROLLUP - It is all about Aggregate Function F() Ex. SUM(), COUNT(), AVERAGE() Generalize: GROUP BY -> ROLLUP -> CUBE Basic Way to Compute: - ROLLUP: Sort the table on the aggregating attributes and then compute the aggregate functions. - CUBE: UNION of many ROLLUP, so the naïve way to compute is union.

The Aggregate Function Call the aggregate function for each new value and invokes the aggregate function to get the final value Init (&handle) Iter (&handle, value) Value = Final(&handle) Start() – Initialize and allocate a scratchpad Next() – When each value to be aggregated End() – Compute and return the aggregate value and then deallocate the scratchpad

GROUP BY review Output: GROUP BY Query:

ROLLUP review For N-Dimension need N UNIONs ROLLUP Query: Output: ROLLUP: Sort the table on the aggregating attributes and then compute the aggregate functions.

ROLLUP review For N-Dimension need N UNIONs ROLLUP Query: Output:

CUBE review

2 N - Algorithm

In 3-D CUBE, the Iter() will be called 8 times BRANDMODELCOLORNUMBER MAZDASPEED6BLACK3 MAZDASPEED6ALL5 MAZDAALLBLACK30 ALLSPEED6BLACK3 MAZDAALL 50 ALL BLACK70 ALLSPEED6ALL5 100

In 3 dimension CUBE, the Iter() will be called 8 times. BRANDMODELCOLORNUMBER MAZDASPEED6BLACK3 MAZDASPEED6ALL5 MAZDAALLBLACK30 ALLSPEED6BLACK3 MAZDAALL 50 ALL BLACK70 ALLSPEED6ALL5 100

Distributive Function Def. Aggregate function F() is distributive, if there is a function G() such that Example: COUNT(), MIN(), MAX() and SUM() - Can be divided into many sub-aggregates - F() = G(), but COUNT() - For COUNT(), G() = SUM() and F()=COUNT() - Both G(),F() return single value

IDAge Distributive Function: COUNT() COUNT1() COUNT2() COUNT3()

IDAge Distributive Function: COUNT() COUNT1() COUNT2() COUNT3() SUM()

Algebraic Function Def. Aggregate function F() is algebraic if there is an M-tuple valued function G() and a function H() that Example: Average(), Center-of-Mass(), MaxN() - Can be divided into many sub-aggregates - Sub-aggregate returns Set-Value - G() returns M-tuple and H() returns single value - For F() = AVERAGE(): G()={value,count}, H()= SUM(value)/SUM(count)

IDAge Algebraic Function: Average()

IDAge G1()={SUM1(), COUNT1} G2()={SUM2(), COUNT2()} G3()={SUM3(), COUNT3()} Algebraic Function: Average()

IDAge H() = SUM()/COUNT() Algebraic Function: Average() G1()={SUM1(), COUNT1} G2()={SUM2(), COUNT2()} G3()={SUM3(), COUNT3()}

Holistic Function Def. Aggregate function F() is holistic if there is no constant bound on the size of the storage needed. Example: Median(), MostFrequent() and Rank() - Must go through every data - Can not separate in to sub-aggregate The most efficient way: 2 N -algorithm (the slowest)

IDAge Algebraic Function: Rank() Have to go through each data!

The Aggregate Function in CUBEs Call the aggregate function for each new value and invokes the aggregate function to get the final value Init (&handle) Iter (&handle, value) Value = Final(&handle)

The Aggregate Function in CUBEs Call the aggregate function for each new value and invokes the aggregate function to get the final value Init (&handle) Iter (&handle, value) Value = Final(&handle) Super-aggregates: Iter_super(&handle, &handle)

Distributive Function To Compute the Distributive Function Value: where Aggregate Function F(), Cardinality C, and Dimension N Color Model Time (Model, Color, Time) (ALL, Color, Time) (Model, ALL, Time) (ALL, ALL, ALL) 3D CUBE (ALL, ALL, Time)

Distributive Function Iter_super(&handle, &handle)

How about INSERT, DELETE and UPDATE? UPDATE = DELETE + INSERT Is it the same?

6. Maintaining Cubes and Roll-Ups F() = Max() Distribute for SELECT and INSERT Holistic for DELETE UPPDATE is DELETE plus INSERT