Presentation is loading. Please wait.

Presentation is loading. Please wait.

Florian Waas, EMC Corp. Leo Giakoumakis, Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/2011 Plan Space Analysis Detecting Plan Regressions in Cost-based.

Similar presentations


Presentation on theme: "Florian Waas, EMC Corp. Leo Giakoumakis, Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/2011 Plan Space Analysis Detecting Plan Regressions in Cost-based."— Presentation transcript:

1 Florian Waas, EMC Corp. Leo Giakoumakis, Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/2011 Plan Space Analysis Detecting Plan Regressions in Cost-based Query Optimizers 1 Florian Waas, EMC/Greenplum

2 Tale of a Plan Regression  Applied ‘obvious’ improvement to optimizer  Passed all regression tests without problem  Shipped proudly!  Lots of customers complain about plan regressions  Hard conversation between Dev and QA  Dev: Why didn’t you find this? This change affects virtually all queries!  QA: Why can’t you tell me what to look for if it’s so ‘obvious’? 2 Florian Waas, EMC/Greenplum

3  Conventional testing of optimizer focuses on a single best-plan found per query  Ignores massive space of rejected alternatives  Plan Space Analysis  Takes many/all plans considered into account  Quantifies optimizer changes – even if result not affected  Detects regressions early in the development process 3 Florian Waas, EMC/Greenplum Takeaways

4 Nomenclature  Optimizer selects best plan found (BPF)  Rejects non-trivial numbers of alternatives  Explicitly or implictly  Plan regression  Code-level change to optimizer leads to bad plan choice  Perceived or actual 4 Florian Waas, EMC/Greenplum

5 Dilemma of Optimizer Testing  Optimizers work off theoretical models  All practical models have limitations  Most non-trivial queries exceed limitations of model  May lead to contradicting optimization problems  Get query Q1 right OR query Q2…  Right or wrong is a matter of view point and business priority 5 Florian Waas, EMC/Greenplum

6 Standard Test Procedure  Choose relevant workload  Freeze BPF  Apply modification  Test against frozen BPF  Diff may indicate regression  Manual intervention needed to determine actual impact  In practice: lots of false positives/negatives 6 Florian Waas, EMC/Greenplum

7 Desiderata for better regression tests  Simplicity, transparency  Simple number  Meaningful correlation to system  Technology agnostic, targeted  Does not reverse engineer optimizer  ‘understands’ executor  Surgical, specific  Actionable  Applicable to any and every workload  Practical  Easy to compute, robust methodology 7 Florian Waas, EMC/Greenplum

8 Plan Spaces  Set of alternatives considered by optimizer  Product specific  Non-trivial size  E.g., TPC-H 5: 230+ million alternatives  Contains optimal plan(s)  According to database parameters  Think: statistics  Pairwise relationships based on cost function  E.g., cost(Popt) < cost(P) 8 Florian Waas, EMC/Greenplum

9 Observation  Given a query  For each plan alternative P  There exists a configuration so that P is optimal  Even if distinctly suboptimal in original query/configuration 9 Florian Waas, EMC/Greenplum

10 Ideal optimizer  Makes no mistakes  Establishes partial order between alternatives according to estimates  Estimated order matches actual execution  Regardless of actual cost values 10 Florian Waas, EMC/Greenplum

11 Plan Space Analysis: Principle 1. Enumerate plan alternatives 2. Have optimizer cost them 3. Determine order O1 according to estimated cost 4. Execute all plans alternatives 5. Determine order O2 according to actual execution cost 6. Compute correlation of O1 and O2 11 Florian Waas, EMC/Greenplum

12 Plan Space Analysis: Correlation  Spearman-Coefficient  Value range [-1,1]  1 perfect monotone function  0 uncorrelated  etc. 12 Florian Waas, EMC/Greenplum

13 Plan Space Analysis: in Practice  Use sample of space  Uniform sampling  Galindo-Legaria et al. VLDB 1994  Waas, Galindo-Legaria, SIGMOD 2000  Simple hints/forcing will do too  Ignore certain plans  cost(P) > cost(Popt) * k  | act(P1) – act(P2) | < d 13 Florian Waas, EMC/Greenplum

14 Experiments  Commercial query optimizer  Built-in ranking module for sampling  Sample of 20 plans/query  Fixed seed for repeatability  3 iterations for execution 14 Florian Waas, EMC/Greenplum

15 TPC-H 15 Florian Waas, EMC/Greenplum  1GB scale factor  (Very) good results overall  Known issues

16 Sensitivity to Regressions  Modified cost model parameter  Costing of hash in HJ  BPF only affected by last modification  Detects any detrimental change immediately  Applies to all types of regressions 16 Florian Waas, EMC/Greenplum

17  Conventional testing of optimizer focuses on a single best-plan found per query  Ignores massive space of rejected alternatives  Plan Space Analysis  Takes many/all plans considered into account  Quantifies optimizer changes – even if result not affected  Detects regressions early in the development process 17 Florian Waas, EMC/Greenplum Takeaways

18 Florian Waas, EMC Corp. Thank you! 18 Florian Waas, EMC/Greenplum


Download ppt "Florian Waas, EMC Corp. Leo Giakoumakis, Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/2011 Plan Space Analysis Detecting Plan Regressions in Cost-based."

Similar presentations


Ads by Google