Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.

Similar presentations


Presentation on theme: "A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress."— Presentation transcript:

1 A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress Ryan LaBrie Robert St. Louis Lin Ye Arizona State University ryan.labrie@asu.edu st.louis@asu.edu lin.ye@asu.edu

2 Agenda A need for a shift in optimization strategy A need for a shift in optimization strategy What our research is focusing on What our research is focusing on How we performed this research How we performed this research Update on our results Update on our results Next steps Next steps

3 Why a Shift, Why Now? HISTORICALLY HISTORICALLY  Relational database technology is really good at what it does…  Transaction-oriented, operational systems  Optimized for data INPUT  FOCUS: Storage of DATA TODAY’S ENVIRONMENT TODAY’S ENVIRONMENT  Large Data Warehouses  Used for decision support  Need to be optimized for information OUTPUT  FOCUS: Retrieval of INFORMATION

4 The Decision Support Problem Relational DBMS limitations Relational DBMS limitations  Too much data  Tera- and petabytes, quickly approaching exabytes  Too complex queries  Structured Query Language  Too long for results  Indexing limitations  Usage of (i.e. Table Scans)  B+ Trees

5 A Possible Decision Support Solution Multidimensional Databases Multidimensional Databases  New effective storage techniques  Simpler modeling techniques  Potential for easier query interfaces and Intelligent Aggregation Intelligent Aggregation  Appropriate use of redundancy  More effective indexing algorithms  Bitmapped indices

6 The Focus of Our Research CURRENT RESEARCH CURRENT RESEARCH 1. Cost comparisons of Relational vs. Multidimensional Decision Support Systems 2. Working towards a multidimensional benchmarking system  TPC-H is positioned as a Decision Support benchmark, however it is based on relational technologies  GOAL: Vendor neutral benchmark for comparing multidimensional database products FUTURE RESEARCH FUTURE RESEARCH  In the long term, show that decisions can be made more easily with multidimensional technology  Simpler design, simple interfaces, faster responses

7 Why Develop a Multidimensional Benchmark? Benchmarking is an established method for creating vendor neutral tests Benchmarking is an established method for creating vendor neutral tests  Transaction Processing Performance Council (TPC) Benchmarking has been examine in other IS fields including Benchmarking has been examine in other IS fields including  Server Platforms: Johnson & Gray, 1993  eCommerce: Menasce, 2002 It has been called for specifically in the data warehousing academic community It has been called for specifically in the data warehousing academic community  Nemati et al., 2000 and Has yet to be done Has yet to be done

8 How Are We Building Our Benchmark Based on the TPC-H relational decision support benchmark Based on the TPC-H relational decision support benchmark Create a relational dimensional model that forms the basis for the data mart Create a relational dimensional model that forms the basis for the data mart Build a multidimensional cube off the dimensional model Build a multidimensional cube off the dimensional model Convert the SQL statement to the equivalent MDX Convert the SQL statement to the equivalent MDX Run both the SQL query and the MDX query, report results Run both the SQL query and the MDX query, report results

9 What We Have Done To Date Initially have mapped all 22 TPC-H relational queries to potential data marts Initially have mapped all 22 TPC-H relational queries to potential data marts  3-4 data marts necessary Built 2 TPC-H data sets (1GB and 10GB) Built 2 TPC-H data sets (1GB and 10GB) Converted TPC-H Query #4 to MDX Converted TPC-H Query #4 to MDX Ran comparisons on both data sets Ran comparisons on both data sets In the process of converting a second query (TPC-H Query #7) for additional analysis/confirmation of gains In the process of converting a second query (TPC-H Query #7) for additional analysis/confirmation of gains

10 TPC-H: Query #4 – Relational SQL SELECT o_orderpriority, COUNT(*) AS order_count FROM orders WHERE o_orderdate >= '1993-07-01' AND o_orderdate < '1993-10-01' AND EXISTS (SELECT * (SELECT * FROM lineitem WHERE l_orderkey = o_orderkey WHERE l_orderkey = o_orderkey AND l_commitdate < l_receiptdate) GROUP BY o_orderpriority ORDER BY o_orderpriority Typical Decision Support Request: Answers the questions, “How many orders were delivered late in Quarter 3 of 1993, sorted by priority?”

11 TPC-H: Query #4 – Multidimensional Expression (MDX) Equivalent SELECT {[Measures].[O Latecount]} ON COLUMNS, {[PriorityDim].children} ON ROWS FROM Q4Cube WHERE ([TimeDim].[All TimeDim].[1993].[Quarter 3])

12 The Database Costs Dilemma DiskSpace? QuerySpeed? BuildTime?

13 Results To Date (Query Speed) TPC-H Query 4 1 GB Dataset 10 GB Dataset Multidimensional 0.33 seconds Relational 46.6 seconds (140x slower) 925 seconds (~15.5 min) (~2800x slower) Relational (optimized w/Indices) 38 seconds (114x slower) Test not run Relational (optimized w/Indices & Striping) 26 seconds (78x slower) 247 seconds (~4.0 min) (~750x slower)

14 Results To Date (Other Measures) TPC-H Query 4 1 GB Dataset 10 GB Dataset Relational DB 1.2 GB 12.5 GB Relational DB (w/Indices) 1.8 GB 28.9 GB Multidimensional Cube Size.16 MB Multidimensional Cube Build Time 46 seconds 356 seconds (~6 minutes)

15 Preliminary Conclusions For a very modest investment organizations will be able to process very large data warehouses For a very modest investment organizations will be able to process very large data warehouses The multidimensional data mart is the only practical (speed, processing time) way to support the end-user decision maker. The multidimensional data mart is the only practical (speed, processing time) way to support the end-user decision maker. Aggregation truly is a substitute for expensive hardware Aggregation truly is a substitute for expensive hardware

16 Next Steps Acquire a larger server Acquire a larger server Build 100GB and 300GB TPC-H data sets Build 100GB and 300GB TPC-H data sets Benchmark both relational and dimensional queries Benchmark both relational and dimensional queries Publish results Publish results Consider ROLAP, HOLAP, MOLAP issues Consider ROLAP, HOLAP, MOLAP issues Possible extensions to some data mining research Possible extensions to some data mining research Possible extensions to decision making through technology research Possible extensions to decision making through technology research

17 Thank You for Your Time Questions? ryan.labrie@asu.edu www.public.asu.edu/~rlabrie (for this presentation and paper)

18 Appendix A: Current System SOFTWARE Microsoft Windows 2000 Advanced Server Microsoft Windows 2000 Advanced Server Microsoft SQL Server 2000 Enterprise Edition Microsoft SQL Server 2000 Enterprise Edition Microsoft SQL Server 2000 Analysis Services Enterprise Edition Microsoft SQL Server 2000 Analysis Services Enterprise EditionHARDWARE (1) 1.8GHz Intel Pentium 4 processor (1) 1.8GHz Intel Pentium 4 processor 768MB RAM 768MB RAM 240GB HD space (3 IDE 80GB 7200RPM Drives) 240GB HD space (3 IDE 80GB 7200RPM Drives) Total cost: $1100 (Hardware only) Total cost: $1100 (Hardware only)


Download ppt "A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress."

Similar presentations


Ads by Google