Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
CS4432: Database Systems II
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
MiniCon Reformulation & Adaptive Re-Optimization Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems February 23, 2005.
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Chiu Luk CS257 Database Systems Principles Spring 2009
Examples (D. Schmidt et al)
Practical Database Design and Tuning
Informatica PowerCenter Performance Tuning Tips
Physical Database Design and Performance
Parallel Programming By J. H. Wang May 2, 2017.
Running Example – Airline
Proactive Re-optimization
Prepared by : Ankit Patel (226)
Ripple Joins for Online Aggregation
Hash-Based Indexes Chapter 11
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Algorithm Analysis CSE 2011 Winter September 2018.
ITEC 2620M Introduction to Data Structures
Overview of Query Optimization
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
April 30th – Scheduling / parallel
Cardinality Estimator 2014/2016
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Database Query Execution
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Data Structures and Algorithms
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Practical Database Design and Tuning
Hash-Based Indexes Chapter 10
Outline Introduction Background Distributed DBMS Architecture
Dynamic Query Optimization
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Evaluation of Relational Operations: Other Techniques
Lecture 30: Final Review Wednesday, December 10, 2003.
Database Systems (資料庫系統)
Statistics Profile For Query Optimization
Chapter 11 Instructor: Xin Zhang
Parallel Programming in C with MPI and OpenMP
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998 Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998

The Problem Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality These estimates may be highly inaccurate, especially for user-defined types or predicates The errors become multiplicative as the number of joins increases We might have chosen a more nearly optimal plan based on greater knowledge

The Solution We shall monitor how the query is doing at key points, and consider dynamically re-optimizing those portions of the query which have not yet been started Since re-optimization is expensive, we shall only do it if we think we will see an improvement

Elements of the Algorithm Annotated Query Execution Plans Annotate plan with estimates of size Runtime Collection of Statistics Statistics collectors embedded in execution tree Keep overhead down Dynamic Resource Re-allocation Reallocate memory to individual operations Query Plan Modification May wish to re-optimize the remainder of query

Annotated Query Plans We save at each point in the tree the expected: Sizes and cardinalities Selectivities of predicates Estimates of number of groups to be aggregated

Statistics Collectors Add into tree Must be collectable in a single pass Will only help with portions of query “beyond” the current pipeline

Resource Re-Allocation Based on improved estimates, we can modify the memory allocated to each operation Results: less I/O, better performance Only for operations that have not yet begun executing

Plan Modification Create new plan for remainder, treating temp as an input Only re-optimize part not begun Suspend query, save intermediate in temp file

Re-Optimization When to re-optimize: Calculate time current should take (using gathered stats) Only consider re-optimization if: Our original estimate was off by at least some factor 2 and if Topt, estimated < 1Tcur-plan,improved where 1  5% and cost of optimization depends on number of operators, esp. joins Only modify the plan if the new estimate, including the cost of writing the temp file, is better

Low-Overhead Statistics Want to find “most effective” statistics Don’t want to gather statistics for “simple” queries Want to limit effect of algorithm to maximum overhead ratio,  Factors: Probability of inaccuracy Fraction of query affected

Inaccuracy Potentials The following heuristics are used: Inaccuracy potential = low, medium, high Lower if we have more information on table value distribution 1+max of inputs for multiple-input selection Always high for user-defined methods Always high for non-equijoins For most other operators, same as worst of inputs

More Heuristics Check fraction of query affected The winner: Check how many other operators use the same statistic The winner: Higher inaccuracy potentials first Then, if a tie, the one affecting the larger portion of the plan

Implementation On top of Paradise (parallel database that supports ADTs, built on OO framework) Using System-R optimizer New SCIA (Stat Collector Insertion Algorithm) and Dynamic Re-Optimization modules

It Works! Results are 5% worse for simple queries, much better for complex queries Of course, we would not really collect statistics on simple queries Data skew made a slight difference - both normal and re-optimized queries performed slightly better