Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙

Slides:



Advertisements
Similar presentations
1 CUBE: A Relational Aggregate Operator Generalizing Group By Jim Gray Adam Bosworth Andrew Layman Microsoft Microsoft.com Hamid Pirahesh IBM.
Advertisements

Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Review for Final Test Indra Budi
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
5.1Database System Concepts - 6 th Edition Chapter 5: Advanced SQL Advanced Aggregation Features OLAP.
Chapter 11 Group Functions
Implementation & Computation of DW and Data Cube.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
 N. Roussopoulos 2007 OLAP & Data Cubing Spring 2007 Nick Roussopoulos
Data Cube and OLAP Server
Introduction to Oracle9i: SQL1 SQL Group Functions.
Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data.
©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 SQL:1999 Advanced Querying Decision-Support Systems Data Warehousing Data.
©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 Extended Aggregation SQL-92 aggregation quite limited  Many useful aggregates.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
SQL SQL stands for Structured Query Language SQL allows you to access a database SQL is an ANSI standard computer language SQL can execute queries against.
1 Lecture 10: More OLAP - Dimensional modeling
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Advanced Databases 5841 DATA CUBE. Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
Enhancements to the GROUP BY Clause Fresher Learning Program January, 2012.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
Data Warehousing and Decision Support courtesy of Jiawei Han, Larry Kerschberg, and etc. for some slides. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Concepts of Database Management, Fifth Edition
Xin  Syntax ◦ SELECT field1 AS title1, field2 AS title2,... ◦ FROM table1, table2 ◦ WHERE conditions  Make a query that returns all records.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross- Tab and Sub-Totals Gray et Al. Presented By: Priya Rajan.
Data Warehousing.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Extended Aggregation in SQL:1999 The cube operation computes.
1 SQL-3 Tarek El-Shishtawy Professor Ass. Of Computer Engineering.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
ISO/IEC JTC1 SC32 1SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th,
Computing & Information Sciences Kansas State University Wednesday, 29 Nov 2006CIS 560: Database System Concepts Lecture 39 of 42 Wednesday, 29 November.
Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.
A Data Model for Supporting On-Line Analytical Processing DataBase Lab. 석사 1 학기 홍 은 주.
Module 4: Grouping and Summarizing Data. Overview Listing the TOP n Values Using Aggregate Functions GROUP BY Fundamentals Generating Aggregate Values.
10 DAX Calculation For Tabular or PowerPivot Model
A Glance at the Window Functions. Window Functions Introduced in SQL 2005 Enhanced in SQL 2012 So-called because they operate on a defined portion of.
Data Analysis. Statistics - a powerful tool for analyzing data 1. Descriptive Statistics - provide an overview of the attributes of a data set. These.
병렬분산컴퓨팅연구실 1 Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP 병렬 분산 컴퓨팅 연구실 석사 1 학기 이 은 정
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation OLAP Queries and SQL:1999.
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.
Data Analysis and OLAP Dr. Ms. Pratibha S. Yalagi Topic Title
Lecturer : Dr. Pavle Mogin
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Lecturer : Dr. Pavle Mogin
Chapter 3 Introduction to SQL(3)
Advanced Queries in MS Access
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
Chapter 5: Advanced SQL Database System concepts,6th Ed.
SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th, 2001 SQL/OLAP
Based on notes by Jim Gray
DATA CUBE Advanced Databases 584.
Data warehouse Design Using Oracle
Oracle8i Analytical SQL Features
CS 3630 Database Design and Implementation
Chapter 4 Summary Query.
SQL: Structured Query Language
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
Slides based on those originally by : Parminder Jeet Kaur
LINQ to SQL Part 3.
Lecture 14: SQL Wednesday, October 31, 2001.
Presentation transcript:

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙

데이타베이스연구실 김호숙 2 목차 Introduction Problems With GROUP BY The Data CUBE Operator Addressing The Data Cube Computing the Data Cube Summary

데이타베이스연구실 김호숙 3 1. Introduction Data analysis Extraction : database 로부터 file 이나 table 로 aggregated data 를 추출. Visualizing : 그 결과를 graphical 하게 가시화. Visualization Tool Space, Color, Time(motion) 등을 이용하여 dataset 을 N-dimensional space 로 표현.

데이타베이스연구실 김호숙 4 Relational system 에서는 N-attribute domain 을 이용하여 N 차원 데이터를 표현. Table 1: Weather Time (UCT)LatitudeLongitudeAltitude (m) Temp (c) Pres (mb) 27/11/94: :58:33 N 122:45:28 W /11/94: :16:18 N 27:05:55W Dimension Measurement

데이타베이스연구실 김호숙 5 SQL 표준 aggregate function COUNT(), SUM(), MIN(), MAX(), AVG() 많은 SQL system 의 추가 제공 함수들 Statistical function (median,standard deviation, variance) Physical function (center of mass) 그밖에 domain specific function. 사용자 정의 aggregation function Illustra system

데이타베이스연구실 김호숙 6 GROUP BY operation SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude;

데이타베이스연구실 김호숙 7 Red Brick system 에서 추가적으로 지원되 는 aggregation functions. Rank(expression) N_tile(expression, n) Ratio_To_Total(expression) Cumulative(expression) Running_Sum(expression,n) Running_Average(expression,n)

데이타베이스연구실 김호숙 8 2. Problems With GROUP BY: SQL Aggregates in Standard Benchmarks BenchmarkQueriesAggregatesGROUP BYs TPC-A, B100 TPC-C1840 TPC-D Wisconsin1832 AS3AP23202 SetQuery751

데이타베이스연구실 김호숙 9 SQL standard GROUP BY operation 으로 지 원하기 어려운 data analysis 형태 Histograms Roll-up Totals and Sub-Totals for drill-downs Cross Tabulations

데이타베이스연구실 김호숙 10 Histogram : aggregation over computed categories SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Country(Latitude,Longitude) AS nation; SELECT day, nation, MAX(Temp) FROM( SELECT Day(Time) AS day, Country(Latitude, Longitude) AS nation, Temp FROM Weather ) AS foo GROUP BY day, nation; SQL92

데이타베이스연구실 김호숙 11 Roll-up Totals and Sub-Totals for drill- downs Sales Roll Up by Model by Year by Color Model Year Color Sales by Model by Year by Color Sales by Model by Year Sales by Model Chevy1994black50 white black85 white Roll up Drill down

데이타베이스연구실 김호숙 12 Supper aggregation item 을 표현하기 위 해 dummy value 인 “ ALL ” 을 추가한 표현 Table 4: Sales Summary Model Chevy YearColorUnits 1994black white ALL black white ALL200 ALL 290 SELECT Model, ALL, ALL, SUM(Sales) FROM Sales WHEREModel = 'Chevy' GROUP BY Model UNION SELECT Model, Year, ALL, SUM(Sales) FROM Sales WHEREModel = 'Chevy' GROUP BY Model, Year UNION SELECT Model, Year, Color, SUM(Sales) FROM Sales WHEREModel = 'Chevy' GROUP BY Model, Year, Color;

데이타베이스연구실 김호숙 13 Cross Tabulation or Cross Tab Chevy Sales Cross Tab Chevy total (ALL) black white total (ALL) 차원의 cross tab 을 위해서는 64 번의 서로 다른 group by 를 통한 결과를 64 번 union 해야 하며 이를 위해 대부분의 SQL system 은 64 번의 data 의 scan 이 발생한다.

데이타베이스연구실 김호숙 The Data CUBE Operator 3 차원 aggregate 를 위한 cube 0 차원 cube – point 1 차원 cube – line 와 point 2 차원 cube – cross tab 과 2 개의 line 와 한 point 3 차원 cube – 3 개의 2 차원 cross tab 의 intersect 를 통한 cube

데이타베이스연구실 김호숙 15

데이타베이스연구실 김호숙 16 CUBE 를 지원하기 위해 확장된 syntax GROUP BY CUBE ( { ( | ) [ AS ] [ ],...} ) SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE ( Day(Time) AS day, Country (Latitude,Longitude)AS nation ) ;

데이타베이스연구실 김호숙 17 CUBE DATA CUBE ModelYearColorSales Chevy 1990 blue 62 Chevy 1990 red 5 Chevy 1990 white 95 Chevy 1990 ALL 154 Chevy 1991 blue 49 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 ALL 198 Chevy 1992 blue 71 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 ALL 156 Chevy ALL blue 182 Chevy ALL red 90 Chevy ALL white 236 Chevy ALL ALL 508 Ford 1990 blue 63 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 ALL 189 Ford 1991 blue 55 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 ALL 116 Ford 1992 blue 39 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 ALL 128 Ford ALL blue 157 Ford ALL red 143 Ford ALL white 133 Ford ALL ALL 433 ALL 1990 blue 125 ALL 1990 red 69 ALL 1990 white 149 ALL 1990 ALL 343 ALL 1991 blue 106 ALL 1991 red 104 ALL 1991 white 110 ALL 1991 ALL 314 ALL 1992 blue 110 ALL 1992 red 58 ALL 1992 white 116 ALL 1992 ALL 284 ALL ALL blue 339 ALL ALL red 233 ALL ALL white 369 ALL ALL ALL 941 SALES ModelYearColorSales Chevy 1990red 5 Chevy 1990white 87 Chevy 1990blue 62 Chevy 1991red 54 Chevy 1991white 95 Chevy 1991blue 49 Chevy 1992red 31 Chevy 1992white 54 Chevy 1992blue 71 Ford 1990red 64 Ford 1990white 62 Ford 1990blue 63 Ford 1991red 52 Ford 1991white 9 Ford 1991blue 55 Ford 1992red 27 Ford 1992white 62 Ford 1992blue 39 2 * 3 * 3 = 18 Cube Relation  (C i + 1) 3 * 4 * 4 = 48 SELECT Model, Year, Color, SUM(sales) as Sales FROM Sales WHERE Model in { ‘Ford’, ‘Chevy’ } AND Year BETWEEN 1990 and 1992 GROUP BY CUBE {Model,Year,Color};

데이타베이스연구실 김호숙 18 ALL 을 추가하면서 SQL 에서 고려할 사항들 모든 ALL value 는 그것이 의미하는 set of aggregates 값으로 해석되어야 한다. Model.ALL = ALL(Model) = {Chevy, Ford } Year.ALL = ALL(Year) = {1990,1991,1992} Color.ALL = ALL(Color) = {red,white,blue} ALL 은 새로운 keyword 가 된다. Column 을 정의할 때 ALL 의 허용 ( 불가 ) 여부가 추가 된다. NULL 값과 같이 다른 aggregate 에 일부가 되지 못한다.

데이타베이스연구실 김호숙 Addressing The Data Cube Percent-of-total : global aggregate SELECT Model, Year, Color SUM(Sales) AS total, SUM(Sales) / total (ALL,ALL,ALL) FROM Sales WHERE Model IN { ‘ Ford ’, ‘ Chevy ’ } AND Year Between 1990 AND 1992 GROUP BY CUBE(Model, Year, Color); SELECT Model,Year,Color,SUM(Sales), SUM(Sales) / ( SELECT SUM(Sales) FROM Sales WHERE Model IN ‘ Ford ’, ‘ Chevy ’ } AND Year Between 1990 AND 1992 ) FROM Sales WHERE Model IN { ‘ Ford ’, ‘ Chevy ’ } AND Year Between 1990 AND 1992 GROUP BY CUBE (Model, Year, Color);

데이타베이스연구실 김호숙 Computing the Data Cube Group by 로부터 cube 의 “ ALL ” tuple 을 계 산하기 위해서 각각의 차원에 ALL value 를 추가한다. N 차원 cube 에서 각각의 attribute cardinality 가 C 1, C 2, C 3 … C n 인 경우 cube relation 의 결과는  (C i + 1) 개 이다.

데이타베이스연구실 김호숙 21 2 차원 value 의 집합에 대한 aggregation functions. 즉 {X ij | i = 1,...,I; j=1,...,J} 일 때 Distributive F({X i,j }) = G({F({X i,j |i=1,...,I}) | j=1,...J}). Count(), Min(), Max(), Sum() Algebraic F({X i,j }) = H({G({X i,j |i=1,.., I}) | j=1,..., J }). Average(), standard deviation, MaxN(), MinN() Holistic F({X i,j |i=1,...,I}). Median(), MostFrequent(), Rank()

데이타베이스연구실 김호숙 Summary SQL ’ 의 기본적인 5 가지 aggregate functions 은 전형적인 data mining operations 을 위해서 rank, N_tile, cumulative, percent of total 과 같은 함수를 포함하는 방향으로 확장되어야 한다. Cube operator generalizes and unifies aggregates, group by, histograms, roll-ups and drill-downs and, cross tabs. Cube 는 distributive 와 algebraic functions class 들에 대해 쉽게 계산이 가능하다.

데이타베이스연구실 김호숙 23 Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Jim Gray … Microsoft Research Adam Bosworth … Microsoft Research Andrew Layman … Microsoft Research Hamid Pirahesh … IBM Research 5 February 1995, Revised 18 October 1995 Technical Report MSR-TR-95-22