Query Optimization Allison Griffin. Importance of Optimization Time is money Queries are faster Helps everyone who uses the server Solution to speed lies.

Slides:



Advertisements
Similar presentations
Forward Data Cache Integration Pattern
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
On-line Index Selection for Physical Database Tuning
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Using Data Flow Diagrams
Using Dataflow Diagrams
1 Maintaining Bernoulli Samples Over Evolving Multisets Rainer Gemulla Wolfgang Lehner Technische Universität Dresden Peter J. Haas IBM Almaden Research.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Making Database Applications Perform Using Program Analysis Alvin Cheung Samuel Madden Armando Solar-Lezama MIT Owen Arden Andrew C. Myers Cornell.
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
Query Processing Presented by Aung S. Win.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Using Dataflow Diagrams – Part 2 Systems Analysis and Design, 7e Kendall & Kendall 7 © 2008 Pearson Prentice Hall.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Access Path Selection in a Relational Database Management System Selinger et al.
EN : Adv. Storage and TP Systems Cost-Based Query Optimization.
Database Management 9. course. Execution of queries.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Lesley Charles November 23, 2009.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
The τ - Synopses System Yossi Matias Leon Portman Tel Aviv University.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries Surajit Chaudhuri Gautam Das Vivek Narasayya Presented by Sushanth.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Presented By Anirban Maiti Chandrashekar Vijayarenu
Optimization Overview Lecture 17. Today’s Lecture 1.Logical Optimization 2.Physical Optimization 3.Course Summary 2 Lecture 17.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Closing the Query Processing Loop in Oracle 11g Allison Lee, Mohamed Zait.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Chapter 14: Query Optimization
Lecture 6- Query Optimization (continued)
CPSC-310 Database Systems
SQL Server Statistics and its relationship with Query Optimizer
Practical Database Design and Tuning
Table spaces.
Methodology – Physical Database Design for Relational Databases
Database Performance Tuning and Query Optimization
The Relational Database Model
Lecture 1 File Systems and Databases.
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
Self-Managed Systems: an Architectural Challenge
Query Processing.
Presentation transcript:

Query Optimization Allison Griffin

Importance of Optimization Time is money Queries are faster Helps everyone who uses the server Solution to speed lies in the algorithm Different performance improvements with different database engines and schemas

Brief History Before 1970’s: Dark days, manual optimization Late 70’s to mid 80’s: – Birth of relational data model and declarative SQL – Optimization is job of system – System R-beginning work on join order optimization – Dynamic Programming: Heuristic Optimizers Mid 80’s to early 90’s: – Extensible query optimization (Exodus) Mid 90’s to late 90’s: – Materialized Views

Volcano Extensible Query Optimizer Generator General purpose cost based query optimizer, based on equivalence rules in algebra – Equivalences: join associativity, select push down, aggregate push down – Extensible: new operations and equivalences can be easily added – Developed by Graefe and McKenna 1993

Materialized Views Can materialize (pre-compute and store) views to speed up queries – Incremental maintenance when database is updated, propagate updates to materialized view without complete re-computation – Deciding when to use materialized views even if query does not refer to materialized view, optimizer can figure out it can be used

Deciding What to Materialize Maintenance cost and query cost – Workload depends on what is materialized: queries and update transactions weights for each component of workload Goal: find set of views that gives minimum cost if materialized, subject to space constraints

What we already know… Query optimizer analyzes set of query execution plans and gives optimal (least cost) – Heavily dependent on optimizer’s estimate for number of rows that will result at each step of QEP – Estimates rely on statistics typically stored in histograms

Recent Approaches to Improve Statistics Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007 IBM’s LEO (Learning Empirical Results in Query Optimization), 2001

Summary of Paper Results Addresses the problem of efficient estimate of number of distinct values of an attribute Builds on leveraging of randomized algorithms Claim to have unbiased estimator for distinct values with lower mean squared error – Past attempts tend to by higher than the actual number so they have come up with way to cut that number down to be more reasonable

Distinct-Value Estimation Propose summary structure (synopsis) for a relation – Synopsis can be used to estimate number of DVs in the partition – Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations – Updates can be performed on compound partitions by using synopses from base relations

LEO - Learning Emperical Results in Query Optimization Autonomic feedback loops that create a self- tuning database query optimizer Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates Reduces the total cost of owning database management systems by simplifying database administration

How it works Monitors queries as they execute Compares the optimizer’s estimates with actuals at each step in a QEP Then computes adjustments to its estimates that may be used during future optimizations of similar queries Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.

Challenges in Research of LEO (1) ensuring stability and convergence of the autonomic system (2) guaranteeing consistency of the overall optimizer's model upon refinements

Results Reduction of query execution time by orders of magnitude at negligible additional run-time cost Reduced administration time Fewer problem queries Overall improved query performance with increased robustness and predictability of query response times

Bibliography “LEO-Learning Empirical Results in Query Optimization.” IBM.. on.html “Optimizing for Query Speed”. SQL. < “Optimizing Database Queries”. IBM.. “Optimize Queries Theory in Practice”.. Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.