Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영

Similar presentations


Presentation on theme: "The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영"— Presentation transcript:

1 The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영 E-mail : 992COG01@mm.ewha.ac.kr

2 Contents Introduction ROLAP & Star Schema design Cubetrees and Aggregate ROLAP Views EDM ( Extended Datacube Model ) Packed R-Tree SelectMapping Algorithm Experiments Conclusions & Future works

3 Data Warehousing Architecture Data in OLAP Environment Data Marts Data in Operational Data Store Data from Operational Systems Data from Operational Systems 11223344 Analysis Query Reporting Tools Purchasing Production Accounting OLTP Sales ETLETL OLAP Server Data Warehouse

4 ROLAP & Star Schema design ROLAP 의 장점 MOLAP Structure 를 유지하는데 드는 비용 절감 효율적인 ( Compact ) 저장, Bulk-update 에 대처가능 ROLAP 의 단점 저장과 분석이 분리 => Index 필요 Bit-map Index, Join Index 등 Efficient Queries using bit-map indices BUT Limited number of range queries are supported space overhead for hi-cardinality attributes Costly updates On-the-fly aggregation is still necessary

5 Materialized Views : Redundancy for Performance Many OLAP queries require summary tables Prior work : Identify the views to materialize based on a workload estimate Efficiently compute and update the views Summary Table Fallacy Storage waste replicate all multi-dimensional coordinates Update Cost View Selection based on static-workloads only high complexity and impractical algorithms (NP-hard)

6 Cubetree A collection of paced R-trees as a “multidimensional” indexing scheme for the Data Cube Features Clusters of Data Efficiency during incremental bulk update and high query throughput Efficient merge-pack algorithm and sequential writing on the disk Combines both storage and indexing in a single data structure within the relational paradigm 구하고자 하는 view 에 따라 CSO optimizer 가 최대 clustering 을 이루도록 가장 적절한 하나 이상의 Cube Tree 를 구한다.

7 EDM ( Extended Data cube Model ) Table R (A,B,C,Q) Relation tuple groupby (A,C) groupby (A,B) groupby (B,C) groupby(A) groupby(B) groupby(C) groupby(none) Table R (A,B,C,Q) Relation tuple groupby (A,C) groupby (A,B) groupby (B,C) groupby(A) groupby(B) groupby(C) groupby(none) (0,0,c,q) (a,0,c,q) (a,0,0,q) (0,b,c,q) T (a,b,c,q) (a,b,0,q) (0,b,c,q) 0 (a,b,0,q) (a,0,0,q) (0,b,0,q) (0,b,c,q) T(a,b,c,q) (0,0,c,q) (a,0,c,q) Relation tuples : points in the N-d space groupby projections : also points point data is very efficient for multidimensional indexing A B C

8 Queries Slice and Dice Queries V1 : Select partkey, suppkey,sum(quantity) from F groupby partkey, suppkey V2 : Select part.type, sum(quantity) from F, part where F.partkey = part.partkey groupby part.type V1 : Select partkey, suppkey,sum(quantity) from F groupby partkey, suppkey V2 : Select part.type, sum(quantity) from F, part where F.partkey = part.partkey groupby part.type Q1 : Give me the total slaes of every part bought from a given supplier S Q2 : Find the total sales per part and supplier to a given customer C SQL

9 Queries ( Cont’d ) 기존의 여러 개의 Index 가 하나의 Cubetree 로 ( Vn => Rn ) ( x min,y min,C, x max,y max,C ) X : Sppkey Y : Partkey Z : Custkey ( x min,S, 0, x max,S,0 )

10 Packed R-Tree Organize and index space using MBRs A B C F GD E K J I H M N L C D E F G BA HIJKLMN

11 Packed R-Tree Organize and index space using MBRs p10 P2 P1 P3 P4 p9 p7 p5 p6 p8 A B C D p2p3p4p6p5p7p8 p9 p10 ABCD p1

12 Packed R-Tree Organize and index space using MBRs p10 P2 P1 P3 P4 p9 p7 p5 p6 p8 A B C ABC p1 p2 p4 p5p9 p7 p8P10p3

13 Packed R-Tree 갱신 시 Space 를 재구성하고 Clustering 을 폐기하는 단점을 극복함 Features minimize space overlap and dead space reduce search & tree size increase SIR ( Sequential I/O Ratio ) smallest # of blocks required to store the result of a query SIR = ------------------------------------------------- Total # of blocks retrieved during a query

14 SelectMapping Algorithm Selection of CubeTree (Rn) view selection + index projection list of V : list of attributes from the fact and the dimension tables that are projected by the biew ex) Projection list of view V1 : P1 = {partkey,suppkey} Arity : # of attributes in the projection list ex) arity of P1 : 2 Valid mapping A Cubetree has views of same arity

15 Implementation & Experiments Cubetree Datablade CSO ( Cubetree Structure Organization ) 과 SelectMapping Algorithm 을 구현하고 Informix Universal Server 에서 구동 됨 Input data for Experiment TCP-D Benchmark data 이용 Set V using traditional relational tables and created the selected set of B-Trees Set V through a forest of Cubetree using the SelectMapping algorithm, No additional indexing Queries Slice queries and Only equal operator ex) The total sales per customer for a given part P

16 Implementation & Experiments Experiment Initial Load Time Total time(secs) for 100 queries Scalability test ( dataset 의 크기를 2 배로 ) Average Query Throughput ( queries / secs ) Incremental Updates of materialized views, Re- computation of materialized views, Incremental updates of Cubetrees

17 Performance : Initial Load Cubetrees 1GB TCP-D 7,110464 Total tuples 51% disk space saving 16 times faster initial creation ( sort time included )

18 Performance : Queries Cubetrees 10 times faster(avg) than the conventional scheme

19 Performance : Throughput & Scalability ThroughputScalability Update Method Total Time Incremental Updates of materialized view >24 hours Re-computation of materialized view 12H 50m 11s Incremental updates of Cubetrees 8m 24s

20 Conclusions & Future Work Cubetrees as an alternative storage and indexing organization for ROLAP views economical in storage and very efficient in query execution and updates Minimal space overhead ( in 6D in a single B-tree ) Efficient bulk updates ( merge pack uver 10GB /H ) Scalable & industrial strength solution ( up to 11 dims) Web Demo http://opsis.umiacs.umd.edu:8080/DEMO


Download ppt "The Cubetree Storage Organization A High Performance ROLAP Datablade 데이터베이스 연구실 석사 3 학기 강 주 영"

Similar presentations


Ads by Google