A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Dwarf: A High Performance OLAP Engine Nick Roussopoulos ACT Inc. & UMD.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
Database Systems: Design, Implementation, and Management Tenth Edition
© Copyright 2011 John Wiley & Sons, Inc.
Chapter 3 Database Management
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Implementing Business Analytics with MDX Chris Webb London September 29th.
Chapter 14 The Second Component: The Database.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
Chapter 13 The Data Warehouse
Designing a Data Warehouse
Principles of Information Systems, Sixth Edition Organizing Data and Information Chapter 5.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Components of the Data Warehouse Michael A. Fudge, Jr.
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
SPONSORS. Microsoft PowerPivot for SQL Server, Excel 2010, and SharePoint 2010 Michael Herman Syntergy, Inc.
Gain Performance & Scalability With RightNow Analytics
Data Warehouse & Data Mining
SharePoint 2010 Business Intelligence Module 2: Business Intelligence.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
IBM Start Now Business Intelligence Solutions. Agenda Overview of BI Who will buy and why Start Now BI solution Benefit to customer.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Vidas Matelis, Toronto SQL Server User Group November 13, 2008.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
OnLine Analytical Processing (OLAP)
BENCHMARKING DATABASES By Samy Kabangu Supervisor : Mr. John Ebden Computer Science Department Rhodes University.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
1 Data Warehouses BUAD/American University Data Warehouses.
Fall CIS 764 Database Systems Design L18.3 Business Intelligence Aspects (aka Decision support systems) (Slides support.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
Microsoft SQL Server 2000 Cheng Ji November 3, 2003.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
© 1999 FORWISS FORWISS MISTRAL Performance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
SO RELIABLE Iain Bray Sales Engineer InterSystems Corporation.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Copyright 2004 John Wiley & Sons, Inc Information Technology: Strategic Decision Making For Managers Henry C. Lucas Jr. John Wiley & Sons, Inc Dinesh.
Centre of Competence on data warehouse Workshop Helsinki Database Cube and Browsing the Cube Mark Rantala.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
What is OLAP?.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. DATABASE.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Database Performance Tuning and Query Optimization
Implementing Data Models & Reports with Microsoft SQL Server
An Introduction to Data Warehousing
View and Index Selection Problem in Data Warehousing Environments
A New Storage Engine Specialized for MOLAP
Chapter 11 Database Performance Tuning and Query Optimization
Presentation transcript:

A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress Ryan LaBrie Robert St. Louis Lin Ye Arizona State University

Agenda A need for a shift in optimization strategy A need for a shift in optimization strategy What our research is focusing on What our research is focusing on How we performed this research How we performed this research Update on our results Update on our results Next steps Next steps

Why a Shift, Why Now? HISTORICALLY HISTORICALLY  Relational database technology is really good at what it does…  Transaction-oriented, operational systems  Optimized for data INPUT  FOCUS: Storage of DATA TODAY’S ENVIRONMENT TODAY’S ENVIRONMENT  Large Data Warehouses  Used for decision support  Need to be optimized for information OUTPUT  FOCUS: Retrieval of INFORMATION

The Decision Support Problem Relational DBMS limitations Relational DBMS limitations  Too much data  Tera- and petabytes, quickly approaching exabytes  Too complex queries  Structured Query Language  Too long for results  Indexing limitations  Usage of (i.e. Table Scans)  B+ Trees

A Possible Decision Support Solution Multidimensional Databases Multidimensional Databases  New effective storage techniques  Simpler modeling techniques  Potential for easier query interfaces and Intelligent Aggregation Intelligent Aggregation  Appropriate use of redundancy  More effective indexing algorithms  Bitmapped indices

The Focus of Our Research CURRENT RESEARCH CURRENT RESEARCH 1. Cost comparisons of Relational vs. Multidimensional Decision Support Systems 2. Working towards a multidimensional benchmarking system  TPC-H is positioned as a Decision Support benchmark, however it is based on relational technologies  GOAL: Vendor neutral benchmark for comparing multidimensional database products FUTURE RESEARCH FUTURE RESEARCH  In the long term, show that decisions can be made more easily with multidimensional technology  Simpler design, simple interfaces, faster responses

Why Develop a Multidimensional Benchmark? Benchmarking is an established method for creating vendor neutral tests Benchmarking is an established method for creating vendor neutral tests  Transaction Processing Performance Council (TPC) Benchmarking has been examine in other IS fields including Benchmarking has been examine in other IS fields including  Server Platforms: Johnson & Gray, 1993  eCommerce: Menasce, 2002 It has been called for specifically in the data warehousing academic community It has been called for specifically in the data warehousing academic community  Nemati et al., 2000 and Has yet to be done Has yet to be done

How Are We Building Our Benchmark Based on the TPC-H relational decision support benchmark Based on the TPC-H relational decision support benchmark Create a relational dimensional model that forms the basis for the data mart Create a relational dimensional model that forms the basis for the data mart Build a multidimensional cube off the dimensional model Build a multidimensional cube off the dimensional model Convert the SQL statement to the equivalent MDX Convert the SQL statement to the equivalent MDX Run both the SQL query and the MDX query, report results Run both the SQL query and the MDX query, report results

What We Have Done To Date Initially have mapped all 22 TPC-H relational queries to potential data marts Initially have mapped all 22 TPC-H relational queries to potential data marts  3-4 data marts necessary Built 2 TPC-H data sets (1GB and 10GB) Built 2 TPC-H data sets (1GB and 10GB) Converted TPC-H Query #4 to MDX Converted TPC-H Query #4 to MDX Ran comparisons on both data sets Ran comparisons on both data sets In the process of converting a second query (TPC-H Query #7) for additional analysis/confirmation of gains In the process of converting a second query (TPC-H Query #7) for additional analysis/confirmation of gains

TPC-H: Query #4 – Relational SQL SELECT o_orderpriority, COUNT(*) AS order_count FROM orders WHERE o_orderdate >= ' ' AND o_orderdate < ' ' AND EXISTS (SELECT * (SELECT * FROM lineitem WHERE l_orderkey = o_orderkey WHERE l_orderkey = o_orderkey AND l_commitdate < l_receiptdate) GROUP BY o_orderpriority ORDER BY o_orderpriority Typical Decision Support Request: Answers the questions, “How many orders were delivered late in Quarter 3 of 1993, sorted by priority?”

TPC-H: Query #4 – Multidimensional Expression (MDX) Equivalent SELECT {[Measures].[O Latecount]} ON COLUMNS, {[PriorityDim].children} ON ROWS FROM Q4Cube WHERE ([TimeDim].[All TimeDim].[1993].[Quarter 3])

The Database Costs Dilemma DiskSpace? QuerySpeed? BuildTime?

Results To Date (Query Speed) TPC-H Query 4 1 GB Dataset 10 GB Dataset Multidimensional 0.33 seconds Relational 46.6 seconds (140x slower) 925 seconds (~15.5 min) (~2800x slower) Relational (optimized w/Indices) 38 seconds (114x slower) Test not run Relational (optimized w/Indices & Striping) 26 seconds (78x slower) 247 seconds (~4.0 min) (~750x slower)

Results To Date (Other Measures) TPC-H Query 4 1 GB Dataset 10 GB Dataset Relational DB 1.2 GB 12.5 GB Relational DB (w/Indices) 1.8 GB 28.9 GB Multidimensional Cube Size.16 MB Multidimensional Cube Build Time 46 seconds 356 seconds (~6 minutes)

Preliminary Conclusions For a very modest investment organizations will be able to process very large data warehouses For a very modest investment organizations will be able to process very large data warehouses The multidimensional data mart is the only practical (speed, processing time) way to support the end-user decision maker. The multidimensional data mart is the only practical (speed, processing time) way to support the end-user decision maker. Aggregation truly is a substitute for expensive hardware Aggregation truly is a substitute for expensive hardware

Next Steps Acquire a larger server Acquire a larger server Build 100GB and 300GB TPC-H data sets Build 100GB and 300GB TPC-H data sets Benchmark both relational and dimensional queries Benchmark both relational and dimensional queries Publish results Publish results Consider ROLAP, HOLAP, MOLAP issues Consider ROLAP, HOLAP, MOLAP issues Possible extensions to some data mining research Possible extensions to some data mining research Possible extensions to decision making through technology research Possible extensions to decision making through technology research

Thank You for Your Time Questions? (for this presentation and paper)

Appendix A: Current System SOFTWARE Microsoft Windows 2000 Advanced Server Microsoft Windows 2000 Advanced Server Microsoft SQL Server 2000 Enterprise Edition Microsoft SQL Server 2000 Enterprise Edition Microsoft SQL Server 2000 Analysis Services Enterprise Edition Microsoft SQL Server 2000 Analysis Services Enterprise EditionHARDWARE (1) 1.8GHz Intel Pentium 4 processor (1) 1.8GHz Intel Pentium 4 processor 768MB RAM 768MB RAM 240GB HD space (3 IDE 80GB 7200RPM Drives) 240GB HD space (3 IDE 80GB 7200RPM Drives) Total cost: $1100 (Hardware only) Total cost: $1100 (Hardware only)