Robust Query Processing through Progressive Optimization SIGMOD 2004 Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Optimization Goal: Declarative SQL query
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Query Processing (overview)
Robust Query Processing through Progressive Optimization Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic Presented.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Optimization Allison Griffin. Importance of Optimization Time is money Queries are faster Helps everyone who uses the server Solution to speed lies.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Multi-Dimensional Arrays
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Access Path Selection in a Relational Database Management System Selinger et al.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
EN : Adv. Storage and TP Systems Cost-Based Query Optimization.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Analysis of Algorithms
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Chapter 5 : Integrity And Security  Domain Constraints  Referential Integrity  Security  Triggers  Authorization  Authorization in SQL  Views 
Query Processing CS 405G Introduction to Database Systems.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Chapter 14: Query Optimization
SQL Server Statistics and its relationship with Query Optimizer
Practical Database Design and Tuning
Adaptive Query Processing Part I
Proactive Re-optimization
Robust Query Processing through Progressive Optimization
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Database Management Systems (CS 564)
Examples of Physical Query Plan Alternatives
Database Query Execution
Practical Database Design and Tuning
Dynamic Query Optimization
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Diving into Query Execution Plans
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Robust Query Processing through Progressive Optimization
Module 4 Loops and Repetition 9/19/2019 CSE 1321 Module 4.
Presentation transcript:

Robust Query Processing through Progressive Optimization SIGMOD 2004 Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic Modified by S. Sudarshan from talk by Raja Agrawal

Motivation  Current optimizers depend heavily upon the cardinality estimations  What if there errors in those estimations?  Errors can occur due to … Inaccurate statistics Invalid assumptions (e.g. attribute independence)

Progressive Query Optimization  Idea: lazily trigger reoptimization during execution if cardinality counts indicate current plan is suboptimal introduces checkpoint (CHECK) operator to compare actual vs estimated cardinality key idea: precompute cardinality ranges for which plan is optimal

Evaluating a re-optimization scheme  Risk Vs Opportunity  Risk: Extent to which re-optimization is not worthwhile  reoptimization chooses another bad plan  work redone  cardinality errors may even cancel, and fixing one may give an even worse plan!  Opportunity: Refers to the aggressiveness  more CHECK operators..

Background  Redbrick Star schema with fact table and multiple dimension tables First apply selections on dimension tables Then decide what plan to use  Kabra & DeWitt 98 (KD98) Introduced idea of mid-query reoptimization Allow partial results to be use like materialized views But ad-hoc cardinality threshold, and only reoptimize fully materialized plans Opportunity Risk

Background  Tukwila data integration system optimizer may have no idea of statistics interleave optimization and query execution  partial query plans  Fragment: fully pipelined tree with doubly pipelined hash join  Query Scrambling reorder query to deal with delayed sources Opportunity Risk

Background  Eddies (Telegraph)  Ingres/DEC Rdb: run multiple access methods competitively then choose  Parametric Query Optimization (PQO) e.g. Cole and Graefe 94, Hulgeri and Sudarshan 02 Choose from a set of plans, each optimal for selectivity range POP: converse: find optimal cardinality range for a give plan Opportunity Risk

Example of Progressive Optimization in Action

Progressive Query Optimization(POP)

Architecture of POP

 CHECK operator to find if a plan is suboptimal At optimization time, find out cardinality range (at CHECK location) for which plan is optimal At run time, ensure cardinality within [l,u] If violated, stop plan execution and reoptimize  Location of CHECKs  Re-optimize taking observed cardinality into account, and exploiting intermediate results where beneficial Heuristic: limit number of reoptimizations (default: 3)

Validity Ranges  Consider a plan edge e that flows rows into operator o, let P be the subplan rooted at o. The validity range for e is an upper and lower bound on the number of rows flowing through e, such that if the range is violated at runtime, we can guarantee P is suboptimal  Ad-hoc thresholds (proposed earlier) are a bad idea E.g. even a 100x error on very small relation may not make a difference in optimal plan

Finding Optimality Ranges  Plan P opt with root operator o opt is being compared with another plan P alt different only in the root operator o alt.

Finding Optimality Ranges  Need to solve cost(P alt, c) – cost(P opt, c) = 0 where c is the cardinality on edge e  Cost functions can be complex/non- linear/non-continuous

Newton-Raphson Iteration

What does this achieve?  Detects suboptimality of the root operator where P opt and P alt share the same input edges.  Validity range might miss a cross-over point with a plan that uses a different join order (and hence has different input edges).  Two plans are structurally equivalent if they share the same set of edges where an edge is defined by the set of rows flowing through it during query execution. Allows different algorithms, and flipping inner/outer

Optimality wrt structurally equivalent plans  Theorem: …. Suppose edges edges e i1, e i2, …, e ik are seen to be “erroneous” wrt cardinality. Then the following statements are equivalent: 1.P is suboptimal with respect to another plan P' that has the same set of edges {e 1, e 2, …, e m } 2.At least one of P i1, P i2, …, Pik is suboptimal given the cardinality errors in those edges in {e 1, e 2, …, e m } that lie under them. 3.At least one of o i1, o i2, …, o ik is a suboptimal operator given the cardinality errors in {e 1, e 2, …, e m } that are in its input edges.

Conservative detection of suboptimality  Suppose we “detect” suboptimality of (R Join S) Join T wrt estimated costs of (R Join T) Join S During run time, we can never observe the cardinality of R Join T We would be making an arbitrary guess as to the correlation of the predicates on the R and T tables Best not to infer suboptimality wrt such estimates  However, reoptimization may result in a different join order

Exploiting Intermediate Results  All the intermediate results are stored as temporary MVs with cardinalities available to the optimizer can be reused if it leads to a better plan  but not necessarily used, e.g. if join result is very large, and a different join order is preferred must be reused if it has performed side-effects  Reoptimization done as part of same transaction

Optional use of MV

Variants of CHECK  Variants applicable in different cases, trade off risk for opportunity  Variants Lazy checking Lazy checking with eager materialization Eager checking without compensation Eager checking with buffering Eager checking with deferred compensation

Lazy Checking  Adding CHECKs above a materialization point (SORT, TEMP etc) No results have been output yet And materialized results can be re-used very low overhead

Lazy checking with eager materialization  Can insert materialization point if it does not exists already Risk: overhead of materialization Typically done only for outer input of indexed nested-loop join  low cost if outer is small (as estimated by optimizer)  and INL is in trouble anyway if outer is large

Eager Checking  Lazy checking may be too late e.g. if very bad join order chosen, with huge intermediate results  Idea: check even before entire result is materialized, and stop early  Problem: what if some results have already been output? Compensation

Eager Checking  EC without Compensation: CHECK is pushed down the materialization point, into pipeline

Eager Checking  EC with buffering CHECK and buffer output from buffer once sure about bound  e.g. [0,b), or [b,infinity] else reoptimize “delayed pipelining”

EC with Deferred Compensation  Only SPJ queries  Identifier of all rows returned to the user are stored in a table S, which is used later in the new plan for anti-join with the new- result stream

CHECK Placement

 LCEM and ECB – outer side of nested- loop join  LC – above materialization points  ECWC and ECDC – anywhere  Do not place CHECKs if no alternative plan above CHECK simple queries with low estimated cost

Performance Analysis: Robustness  TPC-H Q10: Replace constant in selection on lineitem by parameter marker, so optimizer doesn’t know actual selectivity  5 different optimal plans

Risk Analysis  Analyze LC, LCEM, ECB  Can be reoptimized more than once  Conclusion: low overhead/risk

Opportunity Analysis  Goal: how often does opportunity to reoptimize arise?  Introduce LC/LCEM/ECB checkpoints  But turn off reoptimization, and run same plan  Opportunity region for ECB: dotted line

POP in (in)action  Real world workload (DMV data and queries)  Complex predicates leading to cardinality estimation errors substring comparison, like, IN,..

POP in (in)action (contd.)  Re-optimization may result in the choice of worse plan due to: Two estimation errors canceling out each other Re-using intermediate results

Conclusions  POP gives us a robust mechanism for re-optimization through inserting of CHECK (in its various flavors)  Higher opportunity at low risk