Presentation is loading. Please wait.

Presentation is loading. Please wait.

IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.

Similar presentations


Presentation on theme: "IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ."— Presentation transcript:

1 IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)

2 IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing (AQP) Systems: Publication Timeline …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Introduction

3 AQP FamiliesComparisonNew IdeasConclusions Motivation Plenty of recent work on Adaptive Query Processing (AQP) in different contexts –Conventional DBMS query processing, data integration, continuous queries in stream systems No exhaustive, in-depth categorization and comparison of AQP systems to date Difficult to answer questions like: –Will techniques from one system work on another? –What are the shortcomings of each system? –Which system is best for a new application domain? Introduction

4 AQP FamiliesComparisonNew IdeasConclusions Our Contributions Detailed study of current AQP systems Classification of AQP systems into 3 families Comparison across families in terms of AQP tasks Identification of shortcomings & new approaches to address them Introduction

5 AQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

6 IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Traditional Query Processing Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Introduction Statistics Tracker: Creates/updates stats Runstats

7 IntroductionAQP FamiliesComparisonNew IdeasConclusions Need for Adaptive Query Processing Introduction Correlated & skewed data distributions Errors in stats estimates, optimizer mistakes Detect plan suboptimality, re-optimize Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computing

8 IntroductionAQP FamiliesComparisonNew IdeasConclusions Our Focus: AQP for a Single Query Introduction AQP System: –A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single query

9 IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

10 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP System Families Plan-based AQP systems –AQP for traditional plan-based DBMSs Continuous-Query-based (CQ-based) AQP systems –AQP for long-running continuous queries over data streams Routing-based AQP systems –AQP for DBMSs and continuous queries based on adaptive tuple routing AQP Families

11 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Plan-based Systems Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats + Extra operators Collected stats AQP Families

12 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Plan-based Systems Optimizer: Chooses best plan Query Catalog (Original + observed stats) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats + Extra operators Collected stats AQP Families Re-optimize

13 IntroductionAQP FamiliesComparisonNew IdeasConclusions Example Plan-based AQP Systems …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres AQP Families

14 IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Continuous Query Processing Continuous Queries (CQs) are long-running queries usually over data streams –Example CQ: Filtering packet streams Stream properties or system conditions may change while query is running  best plan may change σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets AQP Families

15 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (table sizes, histograms) Statistics Tracker: Creates/updates stats Runstats Uses stats to cost plans

16 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Chooses best plan Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans

17 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Ensures that plan is best for current stats Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans

18 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Stats to track Re-optimize Combined in-part for efficiency Uses stats to cost plans Optimizer: Ensures that plan is best for current stats

19 IntroductionAQP FamiliesComparisonNew IdeasConclusions …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example CQ-based AQP Systems AQP Families

20 IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Routing-based Processing Non-plan-based architecture where tuples are routed individually through operators No optimizer Exemplified by Eddies [AH00] AQP Families σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Using a plan σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Tuple Router Using tuple routing

21 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Routing-based Systems Executor: Runs chosen plan Chosen plan AQP Families Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Statistics Tracker: Creates/updates stats Runstats Uses stats to cost plans

22 IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Routing-based Systems Tuple Router: Integrated Optimizer & Stats Tracker Query or Continuous Query AQP Families Executor: Runs chosen plan Chosen plan Executor: Pool of operators Selective routing of tuples In-memory catalog (operator costs, selectivities, etc.) Uses stats to choose efficient routes

23 IntroductionAQP FamiliesComparisonNew IdeasConclusions …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example Routing-based AQP Systems AQP Families

24 IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

25 IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability

26 IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability

27 IntroductionAQP FamiliesComparisonNew IdeasConclusions Techniques for Tracking Statistics Observation –Mostly in Plan-based systems Competition –Mostly in Plan-based systems Profiling –Mostly in CQ-based systems Exploration –In Routing-based systems Comparison

28 IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Observation [KD98] Collect statistics on operator behavior or intermediate subexpressions in a plan Comparison σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Selectivity of  1 on input stream can be observed here

29 IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Competition [A93] Extra processing to collect statistics Comparison Packets σ1σ1 σ2σ2 σ3σ3 Chosen packets Selectivity of  on input stream σ2σ2 Selectivity of  on input stream

30 IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Profiling [BMM + 04] Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics Builds a “statistical profile” that can be used to estimate many individual statistics Comparison σ1σ1 σ2σ2 σ3σ3 Profiled tuples

31 IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Exploration [AH00] A fraction of tuples are routed along routes different from the current best route to track statistics along those routes No redundant processing Comparison σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Tuple Router

32 IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Extra Overhead Introduced Comparison Increasing overhead Observation Exploration (inefficient routes for some tuples) Profiling (extra processing on some tuples) Competition (lots of extra work)

33 IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Coverage of Different Statistics Comparison Increasing coverage Observation & Competition (limited by plan) Exploration (limited by large number of routes) Profiling (highest since it builds statistics profile)

34 IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Accuracy of Estimation Comparison Increasing accuracy Observation & Competition Exploration (but, susceptible to routing bias) Profiling (depends on sampling fraction)

35 IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

36 IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (1) Many similarities in internals of different AQP families Can re-use many current (and new) AQP techniques across families Ex: Profiling from CQ-based systems –Enables, e.g., faster detection of plan suboptimality in Plan-based systems –Generates more accurate statistics at lower cost in Routing-based systems New Ideas Example Query:  p1 and p2 (R) S ⋈ R INLJ Unclustered index S  ⋈

37 IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (2) Current AQP systems are reactive –E.g., do not consider sensitivity to errors/changes in stats New Ideas Example Query:  p1 and p2 (R) S ⋈ | σ( R)| Hash Join INLJ Cost Proactive Re-optimization R S Hash Join  ⋈ R INLJ Unclustered index S  ⋈

38 IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (3) Challenging meta problems in AQP for continuous queries need to be addressed 1.Larger and more complex plan spaces  higher costs for statistics tracking and re-optimization 2.Tracking “Return-of-Investment” on AQP 3.Avoiding thrashing, e.g., on bursty changes in statistics New Ideas Proposal: Plan Logging for Continuous Queries

39 IntroductionAQP FamiliesComparisonNew IdeasConclusions Plan Logging for Continuous Queries Log the statistics and re-optimization history –Query is long-running –Example view over log for R S T Rate(R) …   R,S) PlanCost 1024 … 0.75P1P1 12762 5642 … 0.72P2P2 72332 934 … 0.76P1P1 12003 ⋈ ⋈ Rate(R)   R,S) P1P1 P2P2 New Ideas Plans lying in a high-dimensional space of statistics time

40 IntroductionAQP FamiliesComparisonNew IdeasConclusions Summary AQP is becoming important: –New data and application trends –CS-wide push towards Autonomic Computing –Significant amount of work on AQP in recent years Our contributions: –In-depth categorization and comparison of AQP systems and techniques –Identified current shortcomings and new approaches to AQP Conclusions


Download ppt "IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ."

Similar presentations


Ads by Google