Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영

Similar presentations


Presentation on theme: "Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영"— Presentation transcript:

1

2 Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영 E-mail : 992COG01@mm.ewha.ac.kr

3 1999/ 12/ 8 Juyoung Kang. Database Lab2 Contents OLAP Overview Key demand of OLAP Approaches Cubing Algorithms Multidimensional aggregation Selection of the views to materialize Method for MOLAP system Conclusions and further researches

4 1999/ 12/ 8 Juyoung Kang. Database Lab3 OLAP Overview 매출액 제품별 매출액은 ? 권역별 매출액은 ? 매출액의 목 표 대비 실적 은 ? 다양한 사용자의 관점 11 22 OLAP 정보의 다차원적 분석을 위한 대화식 분석 도구 정보의 다차원적 분석을 위한 대화식 분석 도구

5 1999/ 12/ 8 Juyoung Kang. Database Lab4 OLAP Overview OLAP 의 특징 다차원 정보, 직접 접근, 대화식 분석, 의사결정에 활용 OLTP vs. OLAPTransactionalsolutionsTransactionalsolutionsOLAPsolutionsOLAPsolutions Data Size 수 - 수십 GB 수백 GB - 수 TB Structured for Data integrity Ease in querying Optimized for Transaction performance Query performance Data features Atomized, Present, Process-oriented Summarized,Historical, Subject oriented Applications Process-oriented Subject oriented

6 1999/ 12/ 8 Juyoung Kang. Database Lab5 Key demand of OLAP Multidimensional Queries Car Sales DataModelModel Ford YearYear 1994 1995 ColorColor Black White Black Ford 1995 White SalesSales 50 40 85 115 FordALL 290 Aggregate by Year Ford1994ALL90 Ford1995ALL120 Model Ford Year 1994 Color Black White Sales 50 40 Ford1995Black85 Ford1995White115 다차원 질의 를 위한 Unioned GROUP BYs 1994 - 1995 년에 팔린 Ford 자동차는 몇 대 인가 ? SELECT Model, Year, Color,Sum(Sales) FROM Sales WHERE Model = ‘Ford’ GROUP BY Model, Year, Color UNION SELECT Model, ‘ALL’, ‘ALL’, Sum(Sales) FROM Sales WHERE Model = ‘Ford’ GROUP BY Model UNION SELECT Model, Year, ‘ALL’, Sum(Sales) FROM Sales WHERE Model = ‘Ford’ GROUP BY Model, Year

7 1999/ 12/ 8 Juyoung Kang. Database Lab6 Key demand of OLAP Queries be answered quickly !! Pre-calculation 다차원 집계 계산을 위한 효율적인 방법은 ? Tradeoffs between Storage and Performance 적절한 저장공간을 사용하면서 응답성능을 최대로 할 수 있는 방법은 ? 강남권 : 20 개 매장 매장 당 제품 수 : 100 여 개 Record to be processed Record to be processed = 20  100  365  2 = 1,460,000 !!! 강남권 : 20 개 매장 매장 당 제품 수 : 100 여 개 Record to be processed Record to be processed = 20  100  365  2 = 1,460,000 !!! 올 해 지난 해 강남권 매장들의 올 해 매출액을 지난 해 매출액과 비교하면 ? 의사결정을 위한 분석적 질의

8 1999/ 12/ 8 Juyoung Kang. Database Lab7 Approaches Requirement for simultaneous Multidimensional aggregation Cube operator [GBLP95] PipeSort, PipeHash [AAD+96] OVERLAP [AAD+96] Requirement for right selection of the views to materialize Greedy Algorithm for selecting views [HRU96] One-step algorithm combining selection and indexes [GHRU97] A Array based method for MOLAP system Multi-way Array based Algorithm [ZDN97]

9 1999/ 12/ 8 Juyoung Kang. Database Lab8 Cubing Algorithms Computing the Data Cube efficiently Computing the Data Cube efficiently Cube Operator [GBLP95] N-Dimensional generalization of simple aggregate function GROUP-BY Compute every possible cell of the data cube Sparsity is not considered SELECT FROM WHERE GROUP BY UNION SELECT FROM WHERE GROUP BY …… SELECT FROM WHERE GROUP BY UNION SELECT FROM WHERE GROUP BY …… GROUP BY CUBE SELECT Model, Year, Ccolor, SUM(sales ) As Sales FROM Sales WHERE Model in ( ‘Ford’, ‘Chevy’) AND Year BETWEEN 1990 AND 1992 GROUP BY CUBE Model, Year, Color

10 1999/ 12/ 8 Juyoung Kang. Database Lab9 Cubing Algorithms ( Cont’d )

11 1999/ 12/ 8 Juyoung Kang. Database Lab10 Cubing Algorithms ( Cont’d ) PipeSort, PipeHash [AAD+96] Fast algorithms for computing a collection of group-bys Optimizations for computing multiple group-bys Smallest-parent, Cache-results, Amortize-scans, Shared Sort, Shared-partitions Combine the optimizations to reduce the total cost PipeSort Reducing the problem to a minimum weight matching problem on a bipartite graph PipeHash Deciding the order of group-by and choosing a shared partition that takes into account the memory availability

12 1999/ 12/ 8 Juyoung Kang. Database Lab11 Cubing Algorithms ( Cont’d ) Performance Results of PipeSort & PipeHash 2-8 times faster than the naive methods PipeHash is within 8% and PipeSort is within 22% of these lower bound Data Set 에 따른 PipeSort, PipeHash 의 성능평가

13 1999/ 12/ 8 Juyoung Kang. Database Lab12 Cubing Algorithms ( Cont’d ) OVERLAP [AAD+96] One particular sorting based scheme Overlaps the computation of different cuboids and minimizes the number of scans ( disk I/O ) needed PipeSort VS. OVERLAP PipeSort : Scanning, sorting cost 를 줄이기 위해 각 Group-by 의 size 를 고려해 sort order 를 결정 => 하나이상 의 order OVERLAP : 하나의 sort order, 다중 pipelined fashion, Group-by 의 size 를 고려하지 않으며, partition 을 이용해 computation 이 더 많이 overlap 되도록 함.

14 1999/ 12/ 8 Juyoung Kang. Database Lab13 Cubing Algorithms ( Cont’d ) Selection of the Views ( group-bys ) Selection of the Views ( group-bys ) Efficient implementation of data cube Efficient implementation of data cube [HRU96] Plan for the right selection of the views to materialize - What and How much to precompute psc 6M pc 6Msc 6Mps 0.8M p 0.2Ms 0.01Mc 0.1M None 1 How many views must we materialize to get reasonable performance? Given that we have space S, what views do we materialize so that we minimize average query cost?

15 1999/ 12/ 8 Juyoung Kang. Database Lab14 Cubing Algorithms ( Cont’d ) The greedy algorithm for selection polynomial-time and competitive ( always gives a solution that is within a constant factor of the optimum ) Guarantee to give at least 63% of the optimum Indexes for selected views [GHRU97] Indexes for selected views [GHRU97] Automated Selection of summary tables and indexes A family of one-step algorithm that select which subcubes and indexes should be computed for improved query performance, given the space constraint

16 1999/ 12/ 8 Juyoung Kang. Database Lab15 Cubing Algorithms ( Cont’d ) ROLAP vs. MOLAPROLAPROLAPMOLAPMOLAP RDB 기반 MDDB 기반 저장과 분석이 분리 저장과 분석이 통합 Table 형식 저장 Array 형식 저장 적은 공간 차지 많은 공간 차지 상대적으로 느림 빠른 응답성능 일부 변동 시 대처 일부 변동 시 재구축 Scalable Relatively less scalable 제품 매장 (Shoes, WestTown, 3-July-1996, $34) POSITION!POSITION! VALUEVALUE

17 1999/ 12/ 8 Juyoung Kang. Database Lab16 Cubing Algorithms ( Cont’d ) Cube computation for MOLAP An array-based method for MOLAP system [ZDN97] Identify the tradeoffs between MOLAP/ROLAP Multi-way Array algorithm Overlaps the computation of different group-bys, while using minimal memory for each group-by The dimension order used by the algorithm minimizes the total memory requirement for the algorithm Performance results : response time Performs much better than previously published ROLAP algorithms ( in this case, OVERLAP )

18 1999/ 12/ 8 Juyoung Kang. Database Lab17 Cubing Algorithms ( Cont’d ) Performance comparison with ROLAP # of valid cells 에 따른 응답성능차원의 수에 따른 응답성능

19 1999/ 12/ 8 Juyoung Kang. Database Lab18 Conclusions & future works Conclusion 의사결정을 지원하기 위한 OLAP 의 필요성 Response Time OLAP or 다차원 데이터 분석의 Key Demand : Response Time 최적의 응답성능을 위한 approaches Multidimensional aggregation Selection of the views to materialize Method for MOLAP system Future Works Performance techniques on WRITE optimization Slicing, dicing, drilldown, rollup 들의 연산에 관련한 최적화 기법연구 Cube computation 의 병렬처리

20 1999/ 12/ 8 Juyoung Kang. Database Lab19 References [AAD+96] S. Agrawal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Nautghton, R. Ramakrishnan, S. Sarawagi. On the Computation of Multidimensional Aggregates. Proc of the 22nd Int. VLDB Conf.,1996. [GBLP95] J. Gray, A. Bosworth, A. Layman,H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub- Totals, Technical Report. MSR-TR-95-22, Microsoft Research, Advance Technology Division, Microsoft Corporation, Redmond, Washington, November 1995 [GHRU97] H. Gupta, V. Harinarayan, A. Rajaraman, J.D. Ullman, Index Selection for OLAP, Proc. ICDE '97, 1997 [HRU96] V. Harinarayanan, A. Rajaraman, J.D. Ullman, Implementing Data Cubes Efficiently, Proc. ACM SIGMOD Int. Conf. On Management of Data, 205- 227, 1996 [ZDN97] Yihong Zhao, Prasad M. Deshpande, J.F. Naughton, An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. In Proceedings of the 1997 SIGMOD Conference, Tucson, Arizona, May, 1997


Download ppt "Performance techniques for OLAP ( On-line analytical processing ) 이화여자대학교 컴퓨터학과 데이터베이스 연구실 석사 2 학기 강 주 영"

Similar presentations


Ads by Google