Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙

Similar presentations


Presentation on theme: "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙"— Presentation transcript:

1 Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙 991COG04@mm.ewha.ac.kr 991COG04@mm.ewha.ac.kr 2000. 3. 23

2 2000-03-23 데이타베이스연구실 김호숙 2 목차 Introduction Problems With GROUP BY The Data CUBE Operator Addressing The Data Cube Computing the Data Cube Summary

3 2000-03-23 데이타베이스연구실 김호숙 3 1. Introduction Data analysis Extraction : database 로부터 file 이나 table 로 aggregated data 를 추출. Visualizing : 그 결과를 graphical 하게 가시화. Visualization Tool Space, Color, Time(motion) 등을 이용하여 dataset 을 N-dimensional space 로 표현.

4 2000-03-23 데이타베이스연구실 김호숙 4 Relational system 에서는 N-attribute domain 을 이용하여 N 차원 데이터를 표현. Table 1: Weather Time (UCT)LatitudeLongitudeAltitude (m) Temp (c) Pres (mb) 27/11/94:150 0 37:58:33 N 122:45:28 W 102211009 27/11/94:150 0 34:16:18 N 27:05:55W 10231024 Dimension Measurement

5 2000-03-23 데이타베이스연구실 김호숙 5 SQL 표준 aggregate function COUNT(), SUM(), MIN(), MAX(), AVG() 많은 SQL system 의 추가 제공 함수들 Statistical function (median,standard deviation, variance) Physical function (center of mass) 그밖에 domain specific function. 사용자 정의 aggregation function Illustra system

6 2000-03-23 데이타베이스연구실 김호숙 6 GROUP BY operation SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude;

7 2000-03-23 데이타베이스연구실 김호숙 7 Red Brick system 에서 추가적으로 지원되 는 aggregation functions. Rank(expression) N_tile(expression, n) Ratio_To_Total(expression) Cumulative(expression) Running_Sum(expression,n) Running_Average(expression,n)

8 2000-03-23 데이타베이스연구실 김호숙 8 2. Problems With GROUP BY: SQL Aggregates in Standard Benchmarks BenchmarkQueriesAggregatesGROUP BYs TPC-A, B100 TPC-C1840 TPC-D162715 Wisconsin1832 AS3AP23202 SetQuery751

9 2000-03-23 데이타베이스연구실 김호숙 9 SQL standard GROUP BY operation 으로 지 원하기 어려운 data analysis 형태 Histograms Roll-up Totals and Sub-Totals for drill-downs Cross Tabulations

10 2000-03-23 데이타베이스연구실 김호숙 10 Histogram : aggregation over computed categories SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Country(Latitude,Longitude) AS nation; SELECT day, nation, MAX(Temp) FROM( SELECT Day(Time) AS day, Country(Latitude, Longitude) AS nation, Temp FROM Weather ) AS foo GROUP BY day, nation; SQL92

11 2000-03-23 데이타베이스연구실 김호숙 11 Roll-up Totals and Sub-Totals for drill- downs Sales Roll Up by Model by Year by Color Model Year Color Sales by Model by Year by Color Sales by Model by Year Sales by Model Chevy1994black50 white40 90 1995black85 white115 200 290 Roll up Drill down

12 2000-03-23 데이타베이스연구실 김호숙 12 Supper aggregation item 을 표현하기 위 해 dummy value 인 “ ALL ” 을 추가한 표현 Table 4: Sales Summary Model Chevy YearColorUnits 1994black50 1994white40 1994ALL90 1995black85 1995white115 1995ALL200 ALL 290 SELECT Model, ALL, ALL, SUM(Sales) FROM Sales WHEREModel = 'Chevy' GROUP BY Model UNION SELECT Model, Year, ALL, SUM(Sales) FROM Sales WHEREModel = 'Chevy' GROUP BY Model, Year UNION SELECT Model, Year, Color, SUM(Sales) FROM Sales WHEREModel = 'Chevy' GROUP BY Model, Year, Color;

13 2000-03-23 데이타베이스연구실 김호숙 13 Cross Tabulation or Cross Tab Chevy Sales Cross Tab Chevy19941995 total (ALL) black5085135 white40115155 total (ALL)90200290 6 차원의 cross tab 을 위해서는 64 번의 서로 다른 group by 를 통한 결과를 64 번 union 해야 하며 이를 위해 대부분의 SQL system 은 64 번의 data 의 scan 이 발생한다.

14 2000-03-23 데이타베이스연구실 김호숙 14 3. The Data CUBE Operator 3 차원 aggregate 를 위한 cube 0 차원 cube – point 1 차원 cube – line 와 point 2 차원 cube – cross tab 과 2 개의 line 와 한 point 3 차원 cube – 3 개의 2 차원 cross tab 의 intersect 를 통한 cube

15 2000-03-23 데이타베이스연구실 김호숙 15

16 2000-03-23 데이타베이스연구실 김호숙 16 CUBE 를 지원하기 위해 확장된 syntax GROUP BY CUBE ( { ( | ) [ AS ] [ ],...} ) SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE ( Day(Time) AS day, Country (Latitude,Longitude)AS nation ) ;

17 2000-03-23 데이타베이스연구실 김호숙 17 CUBE DATA CUBE ModelYearColorSales Chevy 1990 blue 62 Chevy 1990 red 5 Chevy 1990 white 95 Chevy 1990 ALL 154 Chevy 1991 blue 49 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 ALL 198 Chevy 1992 blue 71 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 ALL 156 Chevy ALL blue 182 Chevy ALL red 90 Chevy ALL white 236 Chevy ALL ALL 508 Ford 1990 blue 63 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 ALL 189 Ford 1991 blue 55 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 ALL 116 Ford 1992 blue 39 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 ALL 128 Ford ALL blue 157 Ford ALL red 143 Ford ALL white 133 Ford ALL ALL 433 ALL 1990 blue 125 ALL 1990 red 69 ALL 1990 white 149 ALL 1990 ALL 343 ALL 1991 blue 106 ALL 1991 red 104 ALL 1991 white 110 ALL 1991 ALL 314 ALL 1992 blue 110 ALL 1992 red 58 ALL 1992 white 116 ALL 1992 ALL 284 ALL ALL blue 339 ALL ALL red 233 ALL ALL white 369 ALL ALL ALL 941 SALES ModelYearColorSales Chevy 1990red 5 Chevy 1990white 87 Chevy 1990blue 62 Chevy 1991red 54 Chevy 1991white 95 Chevy 1991blue 49 Chevy 1992red 31 Chevy 1992white 54 Chevy 1992blue 71 Ford 1990red 64 Ford 1990white 62 Ford 1990blue 63 Ford 1991red 52 Ford 1991white 9 Ford 1991blue 55 Ford 1992red 27 Ford 1992white 62 Ford 1992blue 39 2 * 3 * 3 = 18 Cube Relation  (C i + 1) 3 * 4 * 4 = 48 SELECT Model, Year, Color, SUM(sales) as Sales FROM Sales WHERE Model in { ‘Ford’, ‘Chevy’ } AND Year BETWEEN 1990 and 1992 GROUP BY CUBE {Model,Year,Color};

18 2000-03-23 데이타베이스연구실 김호숙 18 ALL 을 추가하면서 SQL 에서 고려할 사항들 모든 ALL value 는 그것이 의미하는 set of aggregates 값으로 해석되어야 한다. Model.ALL = ALL(Model) = {Chevy, Ford } Year.ALL = ALL(Year) = {1990,1991,1992} Color.ALL = ALL(Color) = {red,white,blue} ALL 은 새로운 keyword 가 된다. Column 을 정의할 때 ALL 의 허용 ( 불가 ) 여부가 추가 된다. NULL 값과 같이 다른 aggregate 에 일부가 되지 못한다.

19 2000-03-23 데이타베이스연구실 김호숙 19 4. Addressing The Data Cube Percent-of-total : global aggregate SELECT Model, Year, Color SUM(Sales) AS total, SUM(Sales) / total (ALL,ALL,ALL) FROM Sales WHERE Model IN { ‘ Ford ’, ‘ Chevy ’ } AND Year Between 1990 AND 1992 GROUP BY CUBE(Model, Year, Color); SELECT Model,Year,Color,SUM(Sales), SUM(Sales) / ( SELECT SUM(Sales) FROM Sales WHERE Model IN ‘ Ford ’, ‘ Chevy ’ } AND Year Between 1990 AND 1992 ) FROM Sales WHERE Model IN { ‘ Ford ’, ‘ Chevy ’ } AND Year Between 1990 AND 1992 GROUP BY CUBE (Model, Year, Color);

20 2000-03-23 데이타베이스연구실 김호숙 20 5. Computing the Data Cube Group by 로부터 cube 의 “ ALL ” tuple 을 계 산하기 위해서 각각의 차원에 ALL value 를 추가한다. N 차원 cube 에서 각각의 attribute cardinality 가 C 1, C 2, C 3 … C n 인 경우 cube relation 의 결과는  (C i + 1) 개 이다.

21 2000-03-23 데이타베이스연구실 김호숙 21 2 차원 value 의 집합에 대한 aggregation functions. 즉 {X ij | i = 1,...,I; j=1,...,J} 일 때 Distributive F({X i,j }) = G({F({X i,j |i=1,...,I}) | j=1,...J}). Count(), Min(), Max(), Sum() Algebraic F({X i,j }) = H({G({X i,j |i=1,.., I}) | j=1,..., J }). Average(), standard deviation, MaxN(), MinN() Holistic F({X i,j |i=1,...,I}). Median(), MostFrequent(), Rank()

22 2000-03-23 데이타베이스연구실 김호숙 22 6. Summary SQL ’ 의 기본적인 5 가지 aggregate functions 은 전형적인 data mining operations 을 위해서 rank, N_tile, cumulative, percent of total 과 같은 함수를 포함하는 방향으로 확장되어야 한다. Cube operator generalizes and unifies aggregates, group by, histograms, roll-ups and drill-downs and, cross tabs. Cube 는 distributive 와 algebraic functions class 들에 대해 쉽게 계산이 가능하다.

23 2000-03-23 데이타베이스연구실 김호숙 23 Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Jim Gray … Microsoft Research Adam Bosworth … Microsoft Research Andrew Layman … Microsoft Research Hamid Pirahesh … IBM Research 5 February 1995, Revised 18 October 1995 Technical Report MSR-TR-95-22


Download ppt "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙"

Similar presentations


Ads by Google