IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.

Slides:



Advertisements
Similar presentations
Agency for Healthcare Research and Quality (AHRQ)
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
CS4432: Database Systems II
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Yue Han and Lei Yu Binghamton University.
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University.
Adaptive Ordering of Pipelined Stream Filters S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom In Proc. of SIGMOD 2004, June 2004.
Evaluating Hypotheses
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
What I am doing Amol Deshpande. Selection Ordering  Given a set of selection predicates and correlations between them, find the optimal ordering : Not.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Proactive Re-Optimization Shivnath Babu, Pedo Bizarro, David DeWitt SIGMOD 2005 (presented by Steve Blundy & Oleg Rekutin)
1 Query Optimization Vishy Poosala Bell Labs. 2 Outline Introduction Necessary Details –Cost Estimation –Result Size Estimation Standard approach for.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Scalable Approximate Query Processing through Scalable Error Estimation Kai Zeng UCLA Advisor: Carlo Zaniolo 1.
Query Processing Presented by Aung S. Win.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Deferred Maintenance of Disk-Based Random Samples Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University of Technology Dresden)
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
Self-Managing Cost Models Shivnath Babu Stanford University.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Robust Query Processing through Progressive Optimization SIGMOD 2004 Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Presented By Anirban Maiti Chandrashekar Vijayarenu
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
University of Texas at Arlington Presented By Srikanth Vadada Fall CSE rd Sep 2010 Dynamic Sample Selection for Approximate Query Processing.
Large-Scale Record Linkage Support for Cloud Computing Platforms Yuan Xue, Bradley Malin, Elizabeth Durham EECS Department, Biomedical Informatics Department,
Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Adaptive Query Processing Part I
A paper on Join Synopses for Approximate Query Answering
Proactive Re-optimization
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Relational Query Optimization
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Adaptive Query Processing (Background)
Phase based adaptive Branch predictor: Seeing the forest for the trees
Presentation transcript:

IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)

IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing (AQP) Systems: Publication Timeline … Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Introduction

AQP FamiliesComparisonNew IdeasConclusions Motivation Plenty of recent work on Adaptive Query Processing (AQP) in different contexts –Conventional DBMS query processing, data integration, continuous queries in stream systems No exhaustive, in-depth categorization and comparison of AQP systems to date Difficult to answer questions like: –Will techniques from one system work on another? –What are the shortcomings of each system? –Which system is best for a new application domain? Introduction

AQP FamiliesComparisonNew IdeasConclusions Our Contributions Detailed study of current AQP systems Classification of AQP systems into 3 families Comparison across families in terms of AQP tasks Identification of shortcomings & new approaches to address them Introduction

AQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Traditional Query Processing Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Introduction Statistics Tracker: Creates/updates stats Runstats

IntroductionAQP FamiliesComparisonNew IdeasConclusions Need for Adaptive Query Processing Introduction Correlated & skewed data distributions Errors in stats estimates, optimizer mistakes Detect plan suboptimality, re-optimize Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computing

IntroductionAQP FamiliesComparisonNew IdeasConclusions Our Focus: AQP for a Single Query Introduction AQP System: –A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single query

IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP System Families Plan-based AQP systems –AQP for traditional plan-based DBMSs Continuous-Query-based (CQ-based) AQP systems –AQP for long-running continuous queries over data streams Routing-based AQP systems –AQP for DBMSs and continuous queries based on adaptive tuple routing AQP Families

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Plan-based Systems Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats + Extra operators Collected stats AQP Families

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Plan-based Systems Optimizer: Chooses best plan Query Catalog (Original + observed stats) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats + Extra operators Collected stats AQP Families Re-optimize

IntroductionAQP FamiliesComparisonNew IdeasConclusions Example Plan-based AQP Systems … Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres AQP Families

IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Continuous Query Processing Continuous Queries (CQs) are long-running queries usually over data streams –Example CQ: Filtering packet streams Stream properties or system conditions may change while query is running  best plan may change σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets AQP Families

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (table sizes, histograms) Statistics Tracker: Creates/updates stats Runstats Uses stats to cost plans

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Chooses best plan Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Ensures that plan is best for current stats Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Stats to track Re-optimize Combined in-part for efficiency Uses stats to cost plans Optimizer: Ensures that plan is best for current stats

IntroductionAQP FamiliesComparisonNew IdeasConclusions … Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example CQ-based AQP Systems AQP Families

IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Routing-based Processing Non-plan-based architecture where tuples are routed individually through operators No optimizer Exemplified by Eddies [AH00] AQP Families σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Using a plan σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Tuple Router Using tuple routing

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Routing-based Systems Executor: Runs chosen plan Chosen plan AQP Families Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Statistics Tracker: Creates/updates stats Runstats Uses stats to cost plans

IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Routing-based Systems Tuple Router: Integrated Optimizer & Stats Tracker Query or Continuous Query AQP Families Executor: Runs chosen plan Chosen plan Executor: Pool of operators Selective routing of tuples In-memory catalog (operator costs, selectivities, etc.) Uses stats to choose efficient routes

IntroductionAQP FamiliesComparisonNew IdeasConclusions … Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example Routing-based AQP Systems AQP Families

IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability

IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability

IntroductionAQP FamiliesComparisonNew IdeasConclusions Techniques for Tracking Statistics Observation –Mostly in Plan-based systems Competition –Mostly in Plan-based systems Profiling –Mostly in CQ-based systems Exploration –In Routing-based systems Comparison

IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Observation [KD98] Collect statistics on operator behavior or intermediate subexpressions in a plan Comparison σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Selectivity of  1 on input stream can be observed here

IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Competition [A93] Extra processing to collect statistics Comparison Packets σ1σ1 σ2σ2 σ3σ3 Chosen packets Selectivity of  on input stream σ2σ2 Selectivity of  on input stream

IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Profiling [BMM + 04] Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics Builds a “statistical profile” that can be used to estimate many individual statistics Comparison σ1σ1 σ2σ2 σ3σ3 Profiled tuples

IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Exploration [AH00] A fraction of tuples are routed along routes different from the current best route to track statistics along those routes No redundant processing Comparison σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Tuple Router

IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Extra Overhead Introduced Comparison Increasing overhead Observation Exploration (inefficient routes for some tuples) Profiling (extra processing on some tuples) Competition (lots of extra work)

IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Coverage of Different Statistics Comparison Increasing coverage Observation & Competition (limited by plan) Exploration (limited by large number of routes) Profiling (highest since it builds statistics profile)

IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Accuracy of Estimation Comparison Increasing accuracy Observation & Competition Exploration (but, susceptible to routing bias) Profiling (depends on sampling fraction)

IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (1) Many similarities in internals of different AQP families Can re-use many current (and new) AQP techniques across families Ex: Profiling from CQ-based systems –Enables, e.g., faster detection of plan suboptimality in Plan-based systems –Generates more accurate statistics at lower cost in Routing-based systems New Ideas Example Query:  p1 and p2 (R) S ⋈ R INLJ Unclustered index S  ⋈

IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (2) Current AQP systems are reactive –E.g., do not consider sensitivity to errors/changes in stats New Ideas Example Query:  p1 and p2 (R) S ⋈ | σ( R)| Hash Join INLJ Cost Proactive Re-optimization R S Hash Join  ⋈ R INLJ Unclustered index S  ⋈

IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (3) Challenging meta problems in AQP for continuous queries need to be addressed 1.Larger and more complex plan spaces  higher costs for statistics tracking and re-optimization 2.Tracking “Return-of-Investment” on AQP 3.Avoiding thrashing, e.g., on bursty changes in statistics New Ideas Proposal: Plan Logging for Continuous Queries

IntroductionAQP FamiliesComparisonNew IdeasConclusions Plan Logging for Continuous Queries Log the statistics and re-optimization history –Query is long-running –Example view over log for R S T Rate(R) …   R,S) PlanCost 1024 … 0.75P1P … 0.72P2P … 0.76P1P ⋈ ⋈ Rate(R)   R,S) P1P1 P2P2 New Ideas Plans lying in a high-dimensional space of statistics time

IntroductionAQP FamiliesComparisonNew IdeasConclusions Summary AQP is becoming important: –New data and application trends –CS-wide push towards Autonomic Computing –Significant amount of work on AQP in recent years Our contributions: –In-depth categorization and comparison of AQP systems and techniques –Identified current shortcomings and new approaches to AQP Conclusions