Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates.

Similar presentations


Presentation on theme: "1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates."— Presentation transcript:

1 1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates are a dominant element Issue: Indices are crucial for ensuring efficiency Problem: Select a good set of indices Current approaches are off-line A representative workload is necessary Not suitable for dynamic environments

2 2. Our Solution: COLT COLT: Continuous On-Line Tuning Monitors the workload continuously Selects indices on-line based on the latest traits Features: Tight coupling with the query optimizer Adaptive allocation of profiling resources Self-regulated tuning Controllable overhead Generic framework, applicable to many domains

3 3. System Architecture Notifies COLT of new queries Provides What-If interface Operates continuously Measures index benefit through the What-If optimizer Activated periodically Selects the indices to materialize Sets up candidates for profiling Determines profiling budget

4 4. Index Organization Cold Indices that are not promising Benefit is measured with a crude metric A candidate index always starts Cold Hot Promising indices that are candidates for materialization Benefit is measured accurately with what-if calls Materialized Indices that COLT has materialized Benefit is measured accurately with reverse what-if calls

5 5. Profiler 1.Profiler is notified of new query arrival 2.Profiler updates benefit metric for cold indices 3.Profiler selects hot and materialized indices for profiling 4.Optimizer returns measured what-if gains 5.Profiler updates benefit metric for profiled indices

6 6. Profiling Model Queries are grouped in clusters A cluster represents queries with similar traits Clustering captures redundancy in the workload An index is selected for profiling if: It is relevant to the cluster of the current query Its past what-if measurements in the cluster have high variance Its potential benefit is high Selection method is based on adaptive sampling Goal: Maximize the value of each what-if call

7 7. Self Organizer Activated at the end of an epoch Determines new materialized set from H and M Indices are compared based on predicted benefit Promotes promising cold indices to H Indices are promoted based a 2-cluster grouping Stay in ColdPromoted to Hot Cold index benefit

8 8. Predicted Index Benefit Prediction is based on measured benefit Predicted benefit is high if the past benefit is consistently high Resilience to noise Cost of materialization is discounted ObservationPrediction benefit time

9 9. Adaptive Tuning Profiling budget: max # of What-If calls per epoch Budget is reset at the end of each epoch Budget increases if hot indices can improve performance, decreases otherwise Hot index potential is based on statistical models The end result is that self-tuning is: Intensified, when the workload shifts Suspended, when the system is well tuned

10 10. Performance of COLT COLT adapts to each phase of the workload Better performance compared to Offline tuning Offline: Optimal for complete workload Workload of 4 phases Each phase has 300 queries There are 50 transition queries between phases

11 11. Overhead of COLT Spikes occur at transitions between phases Overhead decreases when the system is well tuned

12 12. The Demonstration COLT inside PostgreSQL Live operation as queries arrive Demonstration of COLT internals Set-up: Simulated exploration of a TPC-H like data set Two workloads with distinct query distributions Workload noise Highlights: On-line tuning to different workloads Resilience of COLT against noise


Download ppt "1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates."

Similar presentations


Ads by Google